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ABSTRACT 

One of a regular series on the status and progress of 
studies on the nature of speech, instrumentation tot its 
investigation, and practical applications, this report covers the 
period ofe January 1-June 30, 1984. The 14 studies summarized in the 
report deal with the following topics: <(1) sources of variability in 
early speech development, (2) invariance in phonetic perception, (3) 
phonetic category boundaries, (4) the categorization of aphasic 
speech errors, (5) universal and language particular aspects of 
vowel-to-vowel co^irticulation , (6) functionally specific articulatory 
cooperation following jaw perturbation during speech, (7) formant 
integration and the -perception of nasal vowel height, (8) the 
relative power o,f cues/ (9) laryngeal management at 
utterance-internal word boundary in American English, (10) closure 
duration and release bdrst amplitude cues to stop consonant manner 
and pl?ice of articulation, (ll)^the effects of temporal stimulus 
properties on perception of the*'dist inct ion between "si" and "spl," 
(12) the physics of controlled collisions, (13) th« perception of 
intonation from sinusoidal sentences, and (14) the nature of 
invariance. (4F*L) 
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SOURCKS OF VARIABILITY IN EARLY SPEKCH DKVELOPMKNT* 
Michael StudderL^Kennedyt 



^^■^e present paper considers the origins of differences among children, 
and wi-thin a child from time to- ti'me, in the early development of speech*. The 
bias of the paper is toward viewing these differences as special cases of gen- 
eral' variability in animal behavior and its development. Some variability 
among children is surely -genetic in origin (Lieberman, this volume); this is 
the stuff of natural selection. Ot^er variability is precisely what we expect 
in a system growing from an open genetic prograli (Mayr, 197^) that depends on 
loosely invariant properties of the environment to specify the course of 
development '{for elaboration, see below, and for an excellent brief discus- 
sion, see Lenneberg, 1967, chap. 1). .Finally, variability within a child is a 
precondition of the adaptive biological process that we term "learning" 
(cf. Fowler & Turvey, 1978). However, I will come to all these matters only 
in the last section of the paper. ' 

My first cortcern, and the topic of the early parts of the paper, is ap- 
parent differences between capacities of infants and older children. Ferguson 
(this volume) notes two main areas of research In child phonology: speech 
perception in infants, and the^^und systems of individual children aged 2-4 
years, as shown by their speech p<x)dyctions. The relation between these two, 
bodies of work is, indeed, "problematic," as Ferguson remarks. For, on the ' 
one hand, we have an infant ^apparptly capable not only of discriminating- 
^virtually every adult segmental contrast with which it is presented, but also 
of discriminating^ speech sound categories across speakers and perhaps even 
across, intriiisic allophonic variants (for a comprehensive review, see Aslin, 
Pisonl; & yusczyk, 1983). On the other hand, we have an older child producing 
a, bewildering variety of sounds In its attempts to ijeproduce a particular 
adult word. The discrepancy is not simply between^ perception and production. 
For we also find the older child, even up to the age of 5 or 6 years, making 
substantial numbers of perceptjjal errors on consonant contrasts (voicing, 
nasality, place of articulation) that would, seemingly, have caused no diffi- 
culty at all when it was „ an infant (see Barton, 1980, for a review). Of 
course, (these are cross-sectional comparisons. But the data ^re well estab- 
lished, Ipnd would usually be taken to reflect the chilc^'s course of develop- 
ment ratlVfir than sampling error. 



»To appear in J. S. Perkell & D. H. Klatt '(Eds. ) , Invariance and variability 
of speech processes . Hillsdale, NJ: Erlbaum, in press. ' 

tAlso Queens College and Graduate Center, City University of New York. 
Acknowledgment . My thanks to Bj5rn Lindblom and Peter MacNeilage for conver- 
sations, to Charles Ferguson and Lise Menn for their papers, to John Locke 
for his book, and ray apologies to all of them for any misconstruals. Prepa- 
ration of the paper was supported in part by NICHD Grant No. HD-01 994' to Has- 
kins Laboratories. 
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How then are we to resolve the paradox? The first step is to acknowledge 
that different task3 place different demands on infant and older child: to 
detect the difference between two patterns of sound (discrimination) Is not 
necessarily to recognize each pattern as an instance of a category (identifi- 
cation) (Barton, 1980^ p. 105). Moreover, even when the tasks assigned to in- 
fant and older child are the same (i.e., discrimination), different behavioral 
measures may give di f ferent results : recovery from habituation to a nonsense 
syllable upon presentation of a new syllable, as measured by high amplitude 
sucking or by heart rate, may not vdraw on .the same 'capacities a3 choosing 
which of -two nonsense words refers to a part icular wooden block ' (Garnica, 
1971). If we assume, as seems reasonable, that the older child has not lost 
capacities for discriminating between sounds ot the surrounding langdage that 
it possessed as an infant, we must conclude that those' capacities are not 
sufficient for more explicitly communicjitl ve tasks (cf. Oiler & Eilers^ 1983; 
Oiler & MacNeilage, 1983). 

Yet the origin of the paradox is more ^han methodological. It also 
arises because infant speech research has "...generally taken for granted a 
phonological unit corresponding to the 'segment' [or, we may add, feature] of 
contemporary phonological theories, even though researchers have somiet^mes 
been familiar" with the problems of i^elating such abstract units to the proces- 
ses of speech perception..." (Ferguson, this volume). Ferguson himself has a 
different and, I believe, more fruitful approach. For rather than viewing the 
child as "acquiring" its phonology from the adult, Ferguson sees the adult's 
phonology as growing j^uT of the child's (cf. Locke, 1983;* Menyuk & Menn, 
1979). Moreover, like Moskowitz (1973), and in accord with sound biological 
principle (e.g., Waddington, 1966), Ferguson sees this growth as a process of 
differentiating smaller structures from larger. The child .does not build 
words with phonemes: phonemes emerge from words. In short, Ferguson shuns 
the pf-eformationist view (long banished from embryology, but still thriving in 
psychology) that attributes adult properties to the child; he seeks rather to 
trace the epigen^tic course from child to adult. 

In the next few. sections I wil^l sketch a view of ^ijifant speech develop- 
ment over roughly the first year* of lif^ that attempts to resolve the 
"problematic" relation between the apparent capaciWes of infant and older 
child. Broadly, my view is that two wrong turns have led into the impasse. 
First, a too narrow notion of development has encourag'ed undue concentration 
on the infant's "initial state." For the biologist, development begins with 
the first division of the fertilized' egg and ends with fleafh. At each moment, 
the organism is sufficient for adaptive response to current internal and* 
external conditions. Birth is certainly an occasion of abrupt discontinuity 
and of radical changes in conditions, but prenatal and postnatal development 
do not differ in principle: the infant's state at birth is simply, the first 
state that psychologists can convenjeptly study. 

Of course, we may treat the whole 'process teleologically, seeing the. end 
in the ^beginning. That ,^ in my view, is the second wrong^ turn. For the habit 
of describing infants' presumed percepts (and articulations) in linguistic 
terms has diverted attention from the central problem of early speech develop- 
ment, namely, imitation. We have been easily diverted because it seems natur- 
al (as, indeed, it is) that, if an adult speaks a word or grasps the air with 
her hand, a young child can repeat the word or imitate the hand movements. 
But how, in fact, do^s the child ^ this? What information in the acoustic or 
optic array Specifics the executed movements? How is the information trans- 
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duced into muscular controls? are far from even imagining an answer to the 
last question. But we may gain leverage on the former (the very question to 
which the infant, learning to speak, mu^st Htself find an answer), if we couch 
our descriptions in auditory and motoric, 'Vat her t^an In ^linguistfc, terms. 
We begin then with a brief summary of what is known about speech perceptuomo- 
tor processes in' adults. 

Cerebral Asymmetry fo r Language in Adults 

Brain lateralization . of f ers a chink through .which we may view the early 
stages of imitative processes essential ^o language development. *ro justify 
this claim my first assumption is that the association between lateralizations 
for language and manual praxis in more than 905t'Of the human population (Levy, 
197^') is not mere coincidence. Second, I assume that lateralization of hand 
/ control evolved in higher primates to facilitate bimanual .coorclinat ion by 
assigning unilateral control to a bilaterally innervated system (MacNetlage, 
Studdert-Kennedy. & Lindblom, 1984). Third, I assume that speech and language 
exploited the already existing neural organization of the left hemisphere to 
^develop a characteristic structure, analogous in certain key respects to the 
structure of coordinated hand movem^iTts. ^ 

r ) 

I have no space to develop the analogy here (for elaboration, see 
MacNeilage^ 1983; MacNeilage, k^tuddert-Kennedy , & Lindblom, in press). In 
any case, for present purposes, t^he needed assumption is simply that language 
evolved in the left Hemisphere for reasons of motor control. The assumption 
is consistent with studies of aphasics (Milner, 197^), of split-brain patients 
(Zaidel, 1978) and of the effects of sodium amytal injection (Borchgrevink, 
1983; Milner, Branch, & Rasmussen, 1964), showing-that in most right-handed 
individuals the right hemisphere is essentially mute: the bilaterally 
innervated speech apparatus is controlled from the left side. 

My final assumption is that a capacity to perceive speech — more exactly, 
to break its patterns into components mate^hed to the motor components of arti- 
culation — evolved alongside the motor system in t+ie left hemisphere. The 

^assumption is consistent with numerous studies of dichotic listening (e.g., 
Kfmura, 1961, 1967; Studdert-Kennedy & Shankweiler, 1970), and has drawn 
further support from studies of split-brain patients. Levy (1974) showed that 
only the left hemisphere of these patients can carry out the phonological 

.analysis needed to recognize written rhymes; Zaidel (1976, 1978) showed that, 
while .the right hemisphere may have a sizeable auditory and visual lexicon, 
only the left hemisphere can carry out the auditory-phonetic analysis neces- 
sary to identify synthetic nonsense syllables, or the phonological analysis 
necessary to read new words. 

In arhort, the stated assumptions and their supporting evidence Justify 

>the 'Claim that the speech perceptuomotor system is vested in the left hemi-. 
bphere of nwst normal right-handed individuals. Let us turn now to the 
development of this system' over the first year of life. 

Cerebral Asymmetry for Sp eech in Infants 

Perception. A number of perception studies has demonstrated dissociation 
of the left and right sides of the brain for perceiving speech and non-speech 
sounds at, or very shortly after, birth. For example, Mol^ese, Freeman and 
Palermo (197^i) measured auditory evoked responses, over left and right tempo- 
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ral 'lobes, of 10 infants, ranging" In age from one week to 10 months. Their 
stimuli were four naturally spoken monosyllables, a C-maJor piano chord and a 
2^0-^000 Hz burst of noise. Each stimulus lasted 500 ms and was presented 
about 100 times, at randomly varying intervals. Median amplitude of .response 
was higher over the left hemisphere for all f our- syl lables In nine out of ten 
infants, higher over the right hemisphere for the chord and the noise in all 
ten infants; the one chi^ld who responded to speech with higher right hemi- 
sphere amplitude had a left-handed mother. Molfese (1977) has reported simi- 
lar asymmetries for syllables and pure tones in neonates. ' * 

Segalowitz and Chapman (1980) studied 153- premature infants with a mean 
gestational age at tes^^ing of 36 w§eks. They measured reduction of limb tre- 
mor over a 2i4-^hour period, at the end of a daily regimen of expqsure^ to 
5-minute spalls of speech (the mother reading nurgery 'rhymes) or music 
(Brahms' "Lullaby"), presented six times a day at 2-hour i-ntervals. Tremor in 
the right arm (but not in the right leg, nor in the left apm or leg) was sig- 
nificantly more reduced by speech than by music or by silence (control group). 
The mechanism of the effect is not understood, nor whether it^is due to corti- 
cal or subcortical asymmetries. • > . , 

Finally, J^est, Hoffman, and Glanville (1982) tested forty-eight 2-, 3^ 
and 4-nionth old infants mr ear differences in a memory-based dichotic task. 
They used a cardiac orienv"^^ response to measure recovery from habituation to 
synthetic stop-vowel syllables and to Minlmoog simulations of concert A (^4^40 
Hz) played on different instruments. In the, speech task, a single dichotic 
habituation pair (either /ba-da/ or /pa-ta/) was presented nine times, at ran- 
domly varying intervals. On the 10th presentation, one ear again received its 
habituation syllable, while the'other received a test syllable (either /ga/ or 
/ka/), differing in place of articulation from both habituation syllables. An 
analogous procedure was followed in the musi93l note task. 

The results showed significantly greater recovery of cardiac response for 
right ear test syllables in the 3" and /4-month-olds, and for left ear musical 
notes in all age groups. The authors Suggest that right-hemisphere memory for 
musical sounds develops before left-hemisphere memory for speech sounds, and 
that the latter begins to develop between the second and third months of life. 

Neithfer these nor any of the several other studies with similar findings 
(see Best et al., 1982, for a brief review) indicate what properties of the 
signal mark.it as speech. We may note, however, that those properties are 
evidently present in isolated' syllables, natural or synthetic, and do not de- 
pend on the melody or rhythm of fluent speech. Moreover, the results of Best 
et al. (1982) invite the inference that infant speech sound discrimination, 
attested by numerous studies, engages left-hemisphere mechanisms no less than 
does adult speech sound discrimination. 

- Production. ^ Evidence 'for early development of the production side of the 
percept uoTnotor link is tenuous, but suggestive. Kuhl and Meltzoff (1982) 
showed that 4- . to S-month'-old infants looked , longer at the video-displayed 
face of a. woman articulating the vowel they were Clearing (either [i] or [a]) 
than at the same face articulating the other vowel W synchrony . The prefer- 
ence disappeared when the signals were pure tone3\ matched in amplitude and 
duration to the vowels, so that infant preference wd>8 evidently for a match 
between mouth ^hape and spectral structure. Simi larly , MacKain, Studdert-Ken- 
nedy, Spieker, and. Stern (1983) Showed that 5- to 6-month-old infants pre- 
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ferred to look at the face of a woman repeating the dlsyllable they were hear- 
ing (e.g., [zuzi]) than at the synchronized face of the same woman repeating' 
another disyllabic (e.g., -[vava]). In both these studies infant {)re(^erenee3 
were for natural* structural correspondences between acoustic and optic infor- 
mation. Since these two sources of information have a common origin in the 
araiculations of the speaker, we may reasonably infer that the i-nfant is' sen- 
sitive to information tliat specifies articulation. (For related work on adult 
"lip-reading," see Campbell & Dodd,' 19>9; Crowder, 1^83; McGurk & MacDonald, 
1976; Summerfield, 1979). 

Two more items co?iplete the circle. First, Meltzoff and Moore ( 1977) 
showed that 12- to 21 -day-old infanta could imitate botJa arbitrary mouth move- 
ments, such as tongue protrusion and mouth Opening, and (of interest for the 
development of ASL) arbitrary hand movements, such as opening and closing the 
hand by serially moving the fingers. Here mouth opening was elicited without 
vocalization; but had vocalization occurred, its st^ructure would necessarily 
have reflected the shape of the mouth. Kuhl and Meltzoff (1982) do^ in fact, 
report as an incidental finding of their study that 10 of their ^2 infants 
"...produced sounds that resembled the adult female's vowels* They seemed to 
be imitating the female talker, 'taking turns' by alternating their vocaliza- 
tions with hers" (p. 1140). Of course. We have no indication that this 
incipient capacity, demonstrated under conditions of controlled attention in 
the laboratory, is actively used by 5-month-old infants in the more variable 
conditions of daily life. * 

\ ^ 

The second item of evidence is a curious aspect of the study by MacKain 
et al. (19'83), cited earlier: infant preferences for a match between the fa-o 
cial movements they were watching and the speech sounds they were hearing were 
statistically sjLgni f leant only when -^they were looking to their right sides. 
Fourteen of the eighteen infants in the study preferred more matches ofi their 
right sides than on their left. Moreover, in a follow-up investigation of 
familial handedness-, MacKain and her colleagues learned ^that six of the 
infants had left^-handed first- or second-order relatives. Of these six, four 
were the infants who displayed more left-side than right-side matches. 

These results can be interpreted in the light. of work by Kinsbourne and 
his colleagues (e.g., Kinsbourne, 1972; Lempert" & Kinsbourne, 1982). This 
work suggests that attention to one side of the body may facilitate processes 
for which the contralateral hemisphere is specialized. If this is so, we may 
infer that infants with a preference for matches on their right side were 
revealing. a left hemisphere sensi tivi ty ^to articulations specified by acoustic 
and optic information. Thus, we have preliminary evidence that 5- tb 
6-month-old infants, close to the onset of babbl ing,* ;^ already display the 
beginnings of a speech perceptuomotor link in the left hemisphere'. 

Here we should strike a note of caution. The evidence reviewed up to 
this point does not demonstrate that specialized phonetic processes are occur- 
ring in the infant. In fact, whatever mechanisms for imi tating articulation 
may be developing in these early months seem to be no different, in principle, 
than corresponding specialized, mechanisms for imitating movements of han^, 
face, and body. What distinguishes the speech perceptuomotor link, at this 
stage of development, is, first, its locus in the brain, anjd second,' its 
modality. The capacity to imttate vocalizations seems to be peculiar to cer- - 
tain birds, certain marine mammals, and man. ^ 
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Speech Perception in Infants 

. : 0-6 months . ' As we have already remarked, and as is well known, infants 
in thTTirst six .months of' life discr^iniinate almost any adult segmental c6n- 
trast on which^they are te3ted. Particularly striking, in the early years of 
this work, initiated by Eimas and his colleagues (Eimas, Siqueland, Jusczyk, & 
Vigorito, 1971), was It and 4-month-old infants' discrimination of synthetic 
syllables along a stop consonant voice-onset time continuum. Discrimination 
was measured by recovery (or no recovery^ of high-amplitude sucking on a 
non-nutritive nipple", in ' response to a change in sound (or no change .for a 
control group), after habituation to repeated presentation of anotjier sound. 
Like adults, infants readily "Biscriminated between acoustically different 
items belonging to different (English) phonetic categories, but not" between" 
acoustically different items belonging to the same category. This finding, 
fortified by similar results on continua of, for example, stop consonant place 

"of articulation (Eimas, 197^), consonant manner (Eimas & Miller, 1980a, 
1980b), and the [r]-[l] distinction (Eimas, 1975), encouraged the hypothesis 
that "'...these early categories serve as the basis for future phonetic cate- 
gories" (Eimas, 1982, p. 3^2). 

However, there is a confusion here between two different types of cate- 
gory. On the one hand, we have categories comprising more-or-less random 
variations in the precise acoustic properties of a single syllable, spoken re- 
peatedly with identical stress and" at an identical rate by the same speaker: 
these are the patterns mimicked by a synthetic series, varied along a single 
acoustic dimension. On the other hand, we have the categories of natural 

.speech, comprising intrinsic allophonic variants, formed by the executioTl of a 
particular phoneme in a range of phonetic contexts, spoken with varying 
stress, at different rates and by different speakers. The latter are presum- 
ably the "future phonetic categories" to which Eimas refers, while the former 
are auditory categories to which infants, chinchillas (for VOT: Kuhl & Mill- 
er, 1978) and macaques (for place of articulation: Kuhl & Padden, 1983) have 
been shown to be sensitive in synthetic speech studies (see also Kuhl, 1981). 
The proper interpretation of these studies would seem then to be that infants 
(and an open set of other animals) can discriminate the several contrasts 

^ tested, if they ai^ presented in an invariant acoustic context. 

Evidence for "phonetic" categories from studies of contrasts across vary- 
ing acoustic contexts #ffers depending on the nature of the variation. Talk- 
er variations, at least on the few contrasts that have been tested, seem to 
cause little difficulty for infant (e.g., Hillenbrand, 1983; Kuhl, 1979) dog 
(Baru, 1975), cat ^(Dewson, 196^1) or chinchilla (Burdick & Miller, 1975). 
Cross-talker categories, then, seem to be auditory rather than phonetic. (We 
may note. In passing, that such findings present a puzzle for accounts of 
speaker normalization that rest on the listener's presumed knowledge of the 
speaker's phonetic space [e.g., Gerstman, 1968; Ladefoged & Broadbent, 
1957].) , 

Studies of contrasts across variations in phonetic context have given 
less consistent results, Warfield, Ruben, and Glackin (1966) trained cats to 
discriminate between the words cat and bat, but found no transfer of training 
to other minimal pairs, beginning with the same segments. Holm^ierg, Morgan 
and Kuhl (1977) studied fricative perception in 6-month-old infants. They 
used an operant head-turning paradigl, in which the infant was conditioned to 
turn its head for visual reinforcement when repeating sounds from one category 

13 

6 



StudclerJ:-Kennedy : Sources of Variability in Early Speech Development 
/ • • 



were changed to repeating sounds from another. They found that infaVits 
discriminated [fj/[e] and [s]/[/] across 'variations in vowel context (e.g,, 
[fa], [fi], [fu]) and syllable position (e.g,, [fa]/[af]). Kuhl (1980) re- 
ports simi.lar results for an infant, trained to discriminate [d]/[g]. 

Katz and Jusczyk (1980), cited in Jusczyk (1982), reasoned that a more 
stringent 'test of infant phonetic categorization would be to show that infants 
mpre readily learn to discriminate between (that is, to generalize within) 
phonetically-based groupings tnan arbitrary Igroupings of the same syllables. 
In a head-turning study of 6-month-old infarvj^s, tl\ey found that most infants 
learned to discriminate between sets of syllables, paired for consonant onset, 
but differing in vowel (e.g., [bi] and [be] vs. [di] and [de]), but not be- 
,tween sets, arbitrarily paired, differing in both consonant and vowel (e.'g., 
[be] and^[di] vs. [bi] and [dc]). However, none of the infants learned to 
discriminate either phonetic or arbitrary groupi^rgs of [b] and [d] followed by 
four vowels ([i, e, o, y]). Jusczyk (1982) i-nterprets the results as provid- 
ing some .. weak support for ... perceptual constancy for stop consonant seg- 
ments occurring in different contexts" (p. 378). 

Before commenting on this study, let us compare its results with those of 
Miller and Eimas (1979), who used a similar set of stimulus materials, to ask 
a different experimental question: Are infants sensitive to the structure of 
syllables? That is to say, * do infants perceive syllables holistically, as 
seamless, undifferentiated patterns, or do they perceive the structure of 
■^syllables, analyzing them into their component segments (consonants and vow- 
els)? Miller and Eimas used a high-amplitude sucking paradigm to test 2-, 3-, 
and 4-month-old infants. One group of in^fants successfully discriminated be- 
tween sets of syllables, paired for consonant onsets, but differing in vowel 
([ba] and [bae] vs. [da] and [dae]), as 0id the infants of Katz and Jusczyk^. 
However ,'*'Wot her group^ also discriminated between*sets arbitrarily paired, 
differing in both consonant and vowel ([ba] and [dae] vs. [bee] and [da]), as 
the infants of Katz ani Jusczyk did not. Miller and Eimas interpreted their 
positive outcome as evidence that infants are sensitive to the segmental 
structure of syllables. 

A similar .conflict in results emerges at a "feature" level when we 
compare a study by Hillenbrantl (1983) with the second and third experiments of 
Miller and Eimds (1979). Hillenbrand used a head-turning paradigm to test the 
capacity of 6-month-old infants to discriminate between sets of syllables 
differing on a single feature (oral-nasal, as in [ba] and [da] vs. [ma] and 
[na]) and sets of syllables differing on arbitrary combinations of two fea- 
tures (oral-nasal and place of articulation,^ as in [ba] and [qa] vs. [na] and 
[ga]). He found that infants were significantly better at discriminating the 
single' feature "phonetic" groups than the arbitrary double f/ature groups. He 
concluded that infants were sensitive to the auditory correlates of consonan- 
tal features . Miller and Eimas ( 1979 ) , on the other hand. In two further 
experiments of their study, tested 2- , 3" and ^i-rnonth-old infants, with a 
high amplitude sucking procedure, on single-feature phonetic groups analogous 
to those of Hillenbrand (voicing vs. place of articulation, oral-nasal 
vs. place of articulation),, and on the corresponding double feature sets Vhere 
the two "features" were arbitrarily combined. Pooling data from the two 
experiments, they found that infants assigned to experimental conditions dis- 
played significantly more recovery from habituation than control infants, and 
that there was no significant difference in recovery for the two types of 
syllable set. Miller and Eimas (1979) concluded from the lack of reduction in 
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performance across set types that infants were sensitive to ^the .structure of 
consonantal segments, that is, to their particular combinations of "features."* 

^ We have then a*^ conflict ^ data from the three studies: 2- to 
4-month-old Infants, tested with l^jgh amplitude sucking, discriminate between 
arbitrary sound classes that are indiscriminable , for 6-month-old ^ infants, 
tested with operant head-turning. If the results are valid, and not mere sam- 
pling error, we have a paradox similar to that for infants and' older children 
wi th which we began. We may ^resolve the paradox on the same two fronts. 
Methodologically, we must acknowledge a commonplace of psychophysical testing 
for ma«y years (e.g., Woodworth, 1938, chap. 17 ): different behavioral meas- 
ures may give different results, even in the same individual, at roughly the 
same time. Moreover, since demonstrating a capacity takes precedence over 
demonstrating its absence, and since 6-month-old infants are unlikely to have 
lost capaci t ies -for discriminat ing among the sounds of the surrounding lan- 
guage that they possessed at 3 months, we must conclude that high-amplitude 
sucking is a more sensitive measure of infant discriminative capacity- than 
operant head-turning. Thus, the two head-turning studies failed to reveal in- 
fant conditioning to arbitrary groupings of syllables because task difficulty 
and behavioral measure interacted — a possibility raised by Jusczyk (1982, p. 
379).^ The attempt to develop a more stringent test of infant consonant 
categorization across vowel contexts than that used by Holmberg et al. (1977) 
for f ricati ves. was therefore not successful. 

Beyond the methodological issue lies the matter of- interpretation. 
Consider, first, the conclusion from Miller and Eimas (1979) that infants are 
sensitive to the aegmental structure of syllables and the featural structure 
of segments. Unfortunately, the conclusion is not forced by the data, since, 
as Aslin et al. (1983) point iout, an infant discriminating, say [ba] and [na] 
from^[da] and [ma], has simply to detect that one (or both) of the syllables 
in the second set is different from either of the syllables in the first set. 
In other words, the infant can discriminate the patterns holistically without 
analysis. Miller and Eimas (1979) recognize this fact ("...we know of no way 
to make this distinction [holistic/analytic] experimentally with infant sub- 
jects"), but Justify their preference for the analytic interpretation, because 
"There is... rather extensive behavioral as well as neurophysiological evidence 
for an analysis into components or features in human and non-human pattern 
perception" (both quotations from p. 355. .footnote 2). I do not doubt this 
evidence, but it does not justify our attributing analytic capaci ties - to the 
3-month-old — particularly when, by doing so, we S%t up a paradoxical discrep- 
ancy between the capacities of infant and older child. 

Consider, next, the evidence that infants can form "phonetic" categories 
across a variety of acoustic contexts. Here again the data are over interpret- 
ed. For, in fact, since every phonetic contrast is marked by an acoustic con- 
trast (if it were not, how would the infant learn to talk?), phonetic and au- 
ditory perception cannot be dissociated in the infant (though they can be in 
the adult: Best, Morrongiello, & Robson, 1981; Best & Studdert-Kennedy , 
1 983 ; Liberman, Isenberg, ^& Raker d, 1 981 ; Mann & Liberman, 1 983; Schwab, 
1981). This fact is recognized by Miller and Eimas (1979, p. 365), and by As- 
lin et al. (1983, passim). What we are left with then is evidence that 
infants, in their first six months of life, can detect auditory similarities 
across certain adult phonetic categories. Incidentally, apart from the study 
of cats mentioned above (Warfield et al., 1966), we have no evidence, so far 
as I know, that other animals cannot do the same. Of course, proving the null 
hypothesis on animals is a thankless task. 

8 15 
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( I Finally, we may ask what role categories, whether auditory or phonetic, 
are presumed to play in the infc>nt*s learning to speak. Eimas (193?) argue.s, 
tViat "...the acquisition of the complex rule systems of linguistics requires 
that the young child treat ail instantiations of a phonetic category as mem-- 
ber^s of a^bingle equivalence class" (p. ^3^6)., He adds Jn a footnote; "...if 
the child treats each possible member of the two voicing categories of English 
as separate entities and not as perceptually identical events or at least as 
members of t,he same equivalence class, then acquisition of the rule for plur- 
alization will necessarily be painfully .slow, if ever learp^ed" (p. 3^6, 
footnote 5). Eimas goes on to Justify the search for perceptual constancy in 
infants on grounds of parsimony, because "...it wo^uld effegtively eliminate 
explanations based on r.eceptive experience* "( p. 3^6). 

There are several things wrong here. First is the implication that a 
process of development relying on experience to di>rect its course is somehow 
unparsimonious, pqrhaps even not "biological . " In fact, just the reverse is 
true. Precisely because full genetic specification is costly, even the lowli- 
est behaviors of non-human animals may depend on broadly invariant -external 
conditions to gui<*e development. (see Immelmann", Barlow, Petrinovich, & Main, 
1981, passim; Lenneberg, 1967, chap. 1; Mayr, 197^, and the brief discussion 
below). Second, the notion of. rule is prescriptive, as though speakers ap-- 
piied rul'es much as they do in a game of chess. In fact, a .phonological rule 
is simply a * description of regularities in . speech; the processes by which 
these regularities arise are completely unknown ( for -excellent discussions, 
see Menn, 1980; Menyuk & Menn, 1979). Finally, once again, the outcome of 
development (the formation of phonological structures jthat control adult 
speaking) is posited to Ije already in place at a time wSaen development has 
scarcely begun. I do not doubt that infants can forrp auditory categories, but 
there is no evidence that thfs capacity is either needed for or brought to 
bear on early speaking.' If it were, we would be hard put to explain the 
word-by-word development of adult phones that Ferguson (this volume) 
describes, or the relatively slow accumulation of the first fifty (or so) 
words. We may indeed suspect that the emergence of audi tory-motoric categor- 
ies, around the beginning of the third year, is a factor in trigger ing ^he 
explosive growth of the child's vocabularjj^ (at an average, rate of perhapB 5-^10 
words a day) over the next four or five years (Miller, 1977, pp. 150 ff.). 

In short, we can resolve the paradoxical discrepancy between the 
capacities of infants and older children, if we refrain from regarding precur- 
sors of a behav ior as instances of the behavior itsel f . No 'doubt, infant 
kicking and stepping (when held erect) are precursors of walking and, with 
normal growth in an appropriate environment, will develop into walking 
(Thelen, 1983). But infant kicks and steps are not strides. 

7-1 2 months . None of the foregoing should be interpreted as ^claiming 
that phonetically relevant development of the infant's perceptual system is 
not going forward during the first six months of life. However, the first 
(and still sparse) behavioral evidence of such development comes from older 
infants. 

Eimas (1975) dhowed that 4- .to 6-month-old English infants discriminated 
between English [r] and Tl]. On the assumption that Japanese infants would 
have done the same, and given the well-known fact that natTve^apanese speak- 
ers, who know no English, do not make this discrimination '(Miyawaki et al*, 
1975), Eimas suggested that ^learning the Sound system of a language' may entail 
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loss of the catvicity to discrlmi nate contrasts not used in the language. Sim- 
ilar suggestions have been made by Aslin and Pisbni (1980), Locke (1983"), and 
others. 

• . ' ' 

Werker and her colleagues' ) have traced the onset of perceptual loss 
to the second six.moiUhs of^ life, a period when the infant is perhaps first 
attending to individual words and the situation^} ^n which they occur (cf. Jus-- 
czyk. 1982; MacKain, "1982). Their initial finding was that seven-monthyold 
Canadian Efiglish 'infantSf tested' in a head-turning paradigm, could 
discriminate between naturally spoken contrasts In Hin<H^5 English-speaking 
adults could not. Worker (1982) followed this up by tracking the decline of 
discriminative capacity in cross-sectional and longitudinal studies. She used 
a conditioned ' head-turning paradigm to test three groups of infants on two 
non-English sound contrasts: Hindi voiceless, unaspirated retroflex vs. den- 
tal stops (cf. Locke, 1983, pp. 90-92)^ and Thompson (Interior Salish,' an 
American Indian language) voiced, ' glottalized velar vs. uvular stops. On the 
Hindi contrast, the number of infants successfully discriminating were: 11/12 
at 6-8 months, 8/12 at 8-10 months, 2/lO at 10-12 monthg; for the Thompson^ 
contrast the results were essentially the same. (An infant was classified as 
having failed to discriminate only if it had successfully discriminated an En- 
glish contrast both before and after failure on a non-English contrast). Fi- 
nally, Werker (1982) reports longitudinal data for six Canadian English 
infants on the same two non-English contrasts. All six discriminated both 
contrasts at 6-8 months, but at 10-12 months none of them made the discrimina- 
tion. By contrast, the ojie Thompson and_ two Hindi infants so far tested at 
wlO-12 months could all make the called for discrimination in their own lan- 
|guage. ' ' ^ 

Perceptual loss is not permanent, since capacity can be recovered by 
adults learning a new language (e.g., MacKain, Best, & Strange, 1981). Nor 
can the effect be general, since sufficiently salient foreign contrasts can 
presumably be disci>iminated even by adults. We may suspect then that loss is 
focused on relatively fine auditory contrasts, specifying slight differences 
in the space-time coordinates of a single articulator's movements, and that it 
arises as a side-effect (lateral inhibition!) of the infafit's developing 
"attention" to closely related contrasts in its own language. This is not to 
suggest that the younger infant is not "attending" to speech during its early 
months. Rather, its search for meaning and communicative function 
(Trevarthen, 1979) may Initially be guided by the rhythm and melody of speech 
(e.g., Mehler, Barrifere, & Jasik-Oerschenf eld, 1976). Only when these larger 
patterns have begun to take formj|j[Menn, 1978a), are the infant's capacities 
for segmental discrimination, readily demonstrated in the laboratory, brought 
to bear on the speech it hears at home. 

Speech Production in the Infant 

The infant, by definition, does not speak (Latin: infans , not speaking). 
But there Is now ample evidence that the discontinuity between babble and 
speech, posited by Jakobson (1968), Is not real. Oiler (1980) provides a 
taxonomy of the emerging stages from phonation (0-1 month) to variegated bab- 
bling (11-12 months). Oiler, Wieman, Doyle, and Ross (1975) describe 
similarities between patterns of babbling and early speech (cf. MacNeilage, 
Hutchinson, &.Lasater, 1981). Vihman, M^cken, Miller, Simmons, and Miller (in 
press) demonstrate parallels in the distribution and organization of sounds in 
speech and babble during the period (roughly 9-15 months) when they o^verlap. 

17 
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What 13 the origin of this continuity? The first possibility is that the 
sound ^stributions of babble and early speech are similar because the infant 
begins to learn the sounds of the language around it and to practice them dur- 
ing its second six months of life.' Locke (1983, chap. 1) has marshalled evi- 
dence against t'hi-s view. First, he has collated data on' the babbling of 9- to 
l«2-month-old ,t\fants growing up in 1 ^4 different language ' environments, 
distributed 'across some haJf dozen language families (Locke; 1983', Table V.3, 
p. .10). These infants were certainly old enough to have begun to discover the 
sound patterns of their languages and, indeed, if the data on perceptual loss 
reviewed above have any generality, perceptual A discovery had already begun. 
Yet of the n3 consopantal sounds entered in LockWls- table over 855t • correspond 
to one of the twelve most frequent sounds in the babbling of English children: 
a strikingly homogeneous distribution. Second, Locke has reviewed some dozen 
studies that have looked for drift in the sounds of infant babbling, during 
the second six months of life, toward the sounds of the surrounding, language. 
Most of the studies either J'ound no evidence of drift or were inconclusive. 
Finally, Locke has reviewed available studies on the babbling of deaf and 
Down's syndrome infants. Despite the common belief that deaf babbling fades 
before the end of the first year, several studies agree that it may continue 
well into early chiKSTiood (5-6 yNgars). But what is remarkable is that the 
developmental course of babbling up, to 12 months is similar in deaf and hear- 
ing infants, and, incidentally, in Down's syndrt)me infants. For example, the 
relative proportions of labial, alveolar and velar consonants follow essen- 
tially the same course:- only after the 12th month does the expected 
preponderance of labial movements in deaf phildren begin. The three strands 
of evidence converge on a process of articulatory development, independent of 
the surrounding language and common to all human infants. 

We are left, then, wit>h the second possible account of the continuity be- 
tween babble and speech, namely • that, as Locke proposes, the phonetic 
proclivities or adults and infants are similar. Both" are largely determined 
by anatomical and physiological constraints on the signaling apparatus^ What 
these constraints may be has only recently comp under scrutiny (e.g., Kent, 
1980; Lindblom, 1983; Ohala, 1983). 

Of course, this hypothesis raises immediately the question of language 
change: if all adult speakers develop from a common infant base, why do 
languages differ? The question is too large, and my competence too^ small, for 
adequate treatment here. However, I note several points. First, as Locke 
(1983) has shown, many infant biases (e.g», for open rat.ier than closed syll- 
ables, for stops over fricatives, for singleton consonants over clusters, and 
so on) are. indeed preserved by many groups of adult speakers (i.e., 
languages), and it is this fact that the continuity of tabble and speech re- 
flects. At the same time, infant preferences are not rigid, because, as Dar-i 
win taught, no animal structure specifies a unique function: A structure 
(e.g., the vocal apparatus) permits an unspecif iable, though presumably limit- 
ed, range of functions, and the natural variability of behavior offers this 
range for selection. Second, infant articulatory capacities are a subset of 
the capacities of mature speakers. As skill develops, the range of response, 
available for selection by a variety of sociocultural forces, widens. Cer- 
tainly, the exact course of historical change will never be fully specified 
for language, any more than for, say, clothing, cuisine, or social organiza- 
tion. Nonetheless, there would seem to be no reason, in principle, why we 
should not develop a cultural-evolutionary' account of langu^ige diversity 
(Lindblom, 198^), compatible with relatively fixed infant articulatory 
proclivities. 
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The conclusion '1 wafil lo dt\4w, is thai pcrx-cifLual .^nd motor 

dovelopment of speech owi^r the fir.'ji/year oj^ life, as marufosied in infant be- 
havior, may Justly be seen/ as parallel, Inck^pendent prociesaes. No doubt, 
physiological changes in the perceptual- and motor centers of the left hemi- 
sphere . are takirxc [)laoe to prepare for the ultimate linkage between the two 
systems. These pirocesses may ba analogous to those in songbirds, such as the^ 
marsh wren, in .which the -perceptual template of its species' song is laid down 
many months before it begins to sing "(Kroodsma, I98l). But behavioral evi- 
dt^nce o^' the perceptuomotor link appears only with that song, just as behav7 
loral evidence of'^the link appears in the infant only wi,t(i its first imitation 
of an adult sounds . ^ ^ 

■» * • 

J rom Babbling to Speech 

The transition from babbling to speech is a murky period. Authis stage 
we • see the firjst clear evidence of a perceptuomotor IJLnk, ^but know little^ 
about what the child perceives. Even when the perceptual data come in, it 
will be a delicate task to determine their relevance, Fcrr as we have noted, a 
capacity demonstrated in the laboratory does riot tell us how, or even if, that 
capacity is put to use in learning to speak. Consequs?ntly , we may have to 
place as much weight on shaky inference from the child's productions as on 
firm evidence from perceptual ^studies, 

A further difficulty , at this stage is that we find it? inc/easingly^diff i- 
cult to refrain from describing the chiTd's productions bvytfieans of phqnetic 
transcriptions. Of course, we do not want to refrain: vt^ranscr ipt ion is our 
readiest mode' of description, because children have vpcal tracts very, like 
adults' and make sounds like adults' sounds. Yet tAanscr iption is a . dou-- 
ble-edged blade. For it is precisely in order to undeK^and the apparently 
segmented structure of speech (and'the resulting adult capaclity to transcribe) • 
that we are studying its ontogeny. As is well known, phonetic segments are 
not readily specified either in articulation or in the signal, so that their 
functional reality Was had to be inferred, in the first instance; from adult 
behaviors, such as errors of perception (e.g., Browman, 1978) and production 
(^•g,, Shattuck-Huf nagel, 1983), backward talking (Cowan, Leavitt, Massaro, & 
,Kent, 1982), aphasic tleficits (e.g., Blumstein, 1981) and, not least, use of 
the alphabet. By relying on a descriptive apparatus that derives from 
characteristics of mature speakers, we put ourselves in danger of attributing 
to the child properties it does not yet possess. 

Despite these difficulties headway has been made, and a view of the child 
as something other than a preformed adult is beginning to emerge (see espe- 
cially Menn, 1978a, 19.78b, 1980, 1983; ft^yuk & Menn, 1979). A striking as- 
pect of this view, though not, I think, a. Surprising one, is the lavish varia- 
bility of the child's productions. In tr^se last few paragraphs, I will 
briefly consider how we might approach this variability. 

Varlabil ity within a child . Ferguson ( this^ volume) presents compelling 
arguments for regarding the word as the unit of contrast in early speech; he^ 
defines a word as . .any apparently conventionalized sound^meaning pair. " 
The definition is important, because it draws attention to the fact that a 
word is not simply a pattern of sound, but a pattern of sound appropriate to a 
particular situation (M^nyuk & Menn, 1979). To discriminate one word from an- 
other, to recognize a word and to use it correctly, therefore entail 
discriminating and recognizing various non-linguistic properties of a situa- 
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tion. Thps, a child's failure to discriminate or recognize a word in a 
perceptuafl test may reflect non-linguistic as much as linguistic factors. 
Moreover / mapV of the child's spoken variations may reflect variability in the 
situatiqns in which the child has heard the word and in the varying salience 
pf its 1 phonetic" properties in those situations: the same adult word may then 
be a different word to the child ifi different situations. 

Nonetheless, highly variable productions of a given word do occur within 
essentially the ^same situation. Ferguson (this volume; Ferguson -& Farwell, 
1 97^, p. 423, footnote 8) lists ten different attempts by a child (K at 
approximately 1 year, 3 months) to say pen within one half-hour session. 
Ferguson comments: "She seemed to be trying to sort out features of nasality, 
bilabial closure, alveolar closure, and voiceiVssness. " . Waterson (1971) 
descr^^ibes numerous such instances for her child, P, in similar {5honetic terms, 
noting as a common occurrence that "features" lose their order and become 
recombined into patterns quite unlike the adult model. Bjferhaps, however, ^'we 
woMld do well to avoid featural terminology. We might attempt a more direct 
articulatory cjescription as do Menyuk and Menn (1979), describing protowords 

of one of Menn's (1978a) subjects, Jacob: " Jacob was varying the timing of 

front-back articulations against the timing of lowering and raising the 
tongue"* (p. 60. Of course, this is little more than a gloss on phonetic 
transcriptions. Yet, in the absence of cineradiographic or even acoustic re- 
cords, the gloss may "...help us see mpr$ clearly what it is the child needs 
to learn and to look at it in a way less coloured by our knowledge of mature 
linguistic behavior" (Menyuk i Menn-, 1979, p. 61; cf. Kent, in press). For 
we then see the speaking of a word not as a bundling of ^features into 
concatenated segments, but as a distribution of interleaved movements of 
articulators over time (Browman & Goldstein, ms.). In the adult, repeated co- 
ordination of particular movements in recurrent patterns has crystallized into 
structures that form the phonological elements of the language. For the child 
the movements have yet to be organiaied. 

Here three points deserve emphasis. First, despite the variability of a 
child's productions, they also display surpi;ising accJracy. The phone classes 
of*^ Ferguson and Farwell (1975) show much variability in voicing and 
manner — due perhaps to unskilled timing of closure and release — yet remarkable 
homogeneity in place of articulation. Also, K's attempts at pen did not in- 
clude, for example, [gAk]: Almost every attempt included some recognizable 
property of the adult word. This means that the acoustic structure of ^idult 
words specifies for the child at least some rough pattern of configurations of 
the vocal feract--necessarily the product of a specialized perceptuomotor link. 
Yet, second, the link is not precisely predetermined: it must develop. Not 
only the movements, but their relative timing and sequencing must develop. 
These are complex processes that almost certainly require active movement for 
their neural control structures to take form. Perhaps, indeed, it is the nor- 
mal function of babbling to promote growth of these structures in the left he- 
misphere. In any event, we are now led to see, and this is my third point, 
that genetically programmed variability is a condition of the child's learning 
to speak. In general, the longer the life span of an animal, the longer the 
period of parental care, and the more complex the mature behavior,, the more 
likely is the behavior to develop through an open genetic program (Mayr, 197^) 
(though, for an exception, see below). §uch a program relies on experience to 
select ancl, if . necessary , shape the n^&eded behavior from a reservoir of vari- 
able responses (cf. Fowler & Turvey, 1978). . 



" 20 



Studdert-Kennedy : Sources of Variability in Early Speech Development 



V ariability among children. As earli'er noted, some individual' -difrer- 
ences^in \he course of development are genet i'c or congenital in oi;»igin. Mac- 
Kain (in press) describes several extreme eases of children born without a 
tongue who approach a surprisingly normal phonetic repertoire by an 
idiosyncratiXc path of development. Yet other differences arise from the 
plasticity 4f an open system, sensitive to environmental contingencies and 
equipped wltn a variable repertoire of responses. Adaptive response to some 
particular, short--term aspect of the environment may lead an individifal down 
an idiosyncratic path, because the precise order in which the parts of the 
system assemble themselves is not preordained. Here we may draw a useful 
analogy' with the self-stabilizing processes in embryological (;Jevelopment 
termed "canalization" ^( Waddington, 1966, p. ^8). Waddington describes how 
various regions of an embryo differentiate into eyes, arms, leg§_, and so on. 
Each region has many possible paths to the same end. The exact path is deter- 
mined, in part, by chance factors in the embryonic environment; equifinality 
'is assured by fixed constraints .inside and outside the developing regioru 
Similarly, we may suppose, no single path is prescribed for the developmeHfM||tt|^ 
a phonological system. Many paths, determined by partially fixed, partian^V^ 
variable perceptual, motoric, and social conditions lead to the same end/^"^^ 
(cf. Lindblom, MacNeilage, & Studdert-Kennedy , 1983). * ^ 

Certainly, there may be a "normal" path, the product of articulatory 
proclivity (or "ease") (Locke, 1983) and perceptual salience. But a child^fan 
readily be diverted from the path by accidents of the speech it hears or of 
its physical structure and growth. For example, if final fricatives become 
salient for a particular child, due. to chances of adult lexicon in some recur- 
rent situation, the child may try them and be successful, >et* be unable 
(through lack of consonant harnfony in the target word qr other "output con- 
straints" [Menn, 1978b]) to execute the initial consonants of the words.. A 
vowel-fricative routine is then established that the child can bring to bear 
on words that most children would attempt with the standard stop-vowel se- 
quence, followed by a "deleteil" f ricati ve (e. g. , Waterson, 1971, p. 185). Yet* 
the deviant child will ultjmiately come upon the same phonological system as 
its peers. 

Here we should note that even quite simple behaviors in non-human animals 
may develop through an open genetic program. The filial and sexual imprinting 
of mallard ducklings or domestic chicks on slow-moving objects (such as a 
walking hu'tnart, or even a red plastic cube revolving on the arm of a rotary mo- 
tor Cvidal, 1976]) is well Iwown. ^ The effect is possible because genetic 
"instructions" are loose:' they do not specify the form and color of the moth- 
er bird, but only her typical .rate of movement. Evolution can afford such 
imprecision because the normal environment -provides the duckling with -only one 
slow-moving object, its mother. If the combination of gross genetic "instruc- 
tions" and a more or less invariant environment permits essential functions 
(here, protection from predators and species identification) to develop, there 
will be no selective pressure for more exact genetic specification. 

For the imprinting of precocial birds, the behavior is roughly fixed, 
while eliciting conditions are only loosely specified. For th0 development of 
language, both the behavior and the eliciting conditions are loosely speci- 
fied. Presumably, the infant has certain minimal, perhaps quite general, 
capacities (its "initial state"), includir^g sensitivity to the contingencies 
of its own behavior, the basis perhaps of social responsiveness (Watson, 1972, 
1981), while the social environment normally offers the infant certain 
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more-or-less invariant inv^i tactions to interact* So, within weeks of birth we 
find the infanb watching intently its' mother's eyes, face and hands, as she 
talks and plays, and we detect certain inchoate communication patterns in 
postures of the infant's head, face, and limbs, and in "pre-*speech" movements 
of tongue and lips (Trevarthen, 1979)/ But^ at this stage, not even ItMe 
modality of language is fixed. For if the infant Is borrl deaf, it will learn^ 
to sign no less readily than its hearing peer learns to speak. Thus, the 
neural substrate is also shaped by environmental contingencies; and the left 
hemisphere, despite its predisposition for speech, is then usurped by sign 
(cf% Neville, 1980; Neville, Kutas, & Schmidt, 1982; Studdert-Kennedy, 1983. 
pp, 17*5 Ti. and pp. 219 ff.). In fact, recent studies of "aphasia" in native 
Am'erican Sign Language signers show remarkable parallels in forrrts of breakdown 
between signers and speakers with similar left hemisphere lesions (Bellugi, 
Poizner, & Klima, 1983). 

The differences between deaf and hearing individuals are certainly gross. 
Yet every child grows in its peculiar fliche with its peculiar anatomical and 
physiological biases, and must therefore discover its own "strategy" for 
-fulfilling the human communicative function. (The term "strategy" should be 
stripped of its cognitive, not to say military, connotations in this context, 
as it is in standard ethological usage.) Indeed, - language, as a sociobiologi- 
cal system, exploits tfie potential for diverSe strategies to mark social 
groups -by channeling speakers into distinctive linguistic styles and 
dialects — to Which, of course, children are highly sensitive ( e.g. , Local, 
1983). Thus, individual differences and individual adaptive response make 
language a force for social cohesion and differentiation. (For examples of 
stable ^diversity within species of bee, treefrog, anemonefish, ruff, and other 
animals, see Krebs and Davies, 1981, chap. 8). « 

Finally, individual differences offer an opening for research. Presum- 
ably, there are limits on possible strategies. But what these limits may be 
we . do not know. As data from longitudinal studies of indi\idual children 
accumulate, strategies may cluster, until it is possible to sketch their lim- 
its. Such work may lead toward clearer notions of "perceptual salience" and 
"ease of articulation." Thus, we come back to the constraints on individuals 
by which phonological elements emerge and phonological systems organize them- 
selves (Liridblom et al., 1983). 
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Footnotes 

'The periods used here are not fixed "stages" of development. They are 
simply convenient headings that correspond roughly to a period before babbling 
(0--6 months) on which most of the infant perceptual research has focussed, and 
a period of babbling (7-12 months) on which there has been very little 
perceptual research. 

^This interpretation assumes that arbitrary groups were, in' fact, more 
difficult to discriminate than "phonetic" groups. Perhaps it is easier to de- 
tect a difference between groups, if all members of one group differ from all 
members .of another group on the same dimension ("phonetic") than if each mem- 
ber of one group differs from each member of another on a different dimension 
(arbitrary). The difference in task difficulty might then be great enough to 
show up, if the criterial response is itself relatively difficult (head Uirn- 
iftg), but not if the response is relatively easy (high amplitude sucking). 

'Jusczyk (1982) makes the same point, proposing the ".. .possibili- 
ty. . .that. . .recognition of phonetic identities is not achieved until the child 
is engaged in learning how to read" (p. 365, footnote 3). If "recognition" 
here means "metalingui34iic awareness," Jusczyk may be right. But functional 
categories surely predate the alphabet, both ontogenetically and historically. 
The alphabet (like dance notation) can only succeed becaUse its units corre- 
spond to functional units of perceptuomotor control. The task for the child, 
learning to read, is to discover these units in its own behavior. 

**! am not proposing that language can take any arbitrary form. On the 
contrary, its general form, that is, its two-leveled hierarchical structure of 
phonology and syntax, emerges necessarily from its function. Innumerable de- 
tails of form within these levels must result from more-or-less invariant 
perceptuomotor, cognitive and pragmatic constraints, of whiOh we know, at pre- 
sent, very little. 
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A comment on C, A. Ferguson's "Discovering sound units and constructing sound 
systems: It's child's play" 



Michael Studdert-Kennedyt 



The variability discussed by Ferguson is, of course, quite different from 
the variability that has been the focus of much - speech 'research since its 
inception, and especially of research by Ken Stevens. For this focus has been 
on what we might call lawful variability: the goal has been to discover the, 
invariants presumed to underlie regular variations in the articulatory and 
acoustic structure of phonetic elements as a function of stress, rate, and 
context, Ferguson*^ qoncern, on the other hand, is with the seemingly unlaw- 
ful (certainly unpredictable and therefore, in effect, random) variability of 
early child speech, both within and across children. Moreover, Ferguson's 
work is mainly concerned with production, while Stevens* interests (at least 
as they bear on child phonology) have largely been in the problem that acous- 
tic variability poses for perception. Finally, even the* unit of variation 
that occupies Ferguson, namely the word, differs from the familiar units of 
concern in speech research. In spite (or because) of- these differences, I be- 
lieve that the work Ferguson discusses may carry the seeds of a new and fruit- 
ful approach to the notorious puzzles of segmentation and invariance. 

My purpose j^ere is to trace some implications of what Ferguson describes, 
as he follows the emergence of .the child's first words over roughly the third 
half-year of life. The unit of contrast at this stage, Ferguson tells us, is 
the word defined as "...any apparently conventionalized sound-meaning pair.'' 
The emphasis on function is important. The word is a unit of contrast because 
it is a unit of meaning*, offered by the surrounding language and commensurate 
with the child's cognitive grasp. This does not imply that other structures 
are not already being put to contrastive use; for they certainly are, as 
Menn's (1978) study of early intonation, for example, has shown us. However, 
it is Ferguson^s hypothesis that the word is the simplest * non-^rosodic unit 
with which a child can begin to accomplish some part of its communicative in- 
tent. 

An imp^tant implication of the claim that the .word is' the unit of con- 
trast is that smaller units, that is, phone-sized segments and features, are 
not. Thjs does not mean that acoustic correlates of phones and features can- 



*To appear in J. S. Perkell &'D. H. Klatt (Eds.), Invariance and variabgflity 
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not be described in the dtterancen of a child. Nor (as we shall see shortly) 
does it mean that words are perceived by the child as unanalyzed integers. 
All that it means is that these smaller units have not yet taken on, for the 
child, the systemic function of contrast that they serve in the adult. 

To elaborate these notions somewhat, let us speculate briefly on how the 
child perceives and proWcisa weirds. Most early words are open monosyllables, 
or reduplicated syllables, formeil by the child's closing and then opening its 
mouth, usually while its vocal cords vibrate. What must the child do, if it 
is to close and open its mouth in such a way that the acoustic consequences 
will count as a word? (Here I disregard so-called "proto-words, " recurrent 
phonetic structures that cannot be traced to a(i adult model.) First, of 
course, the child must execute the act in some appropriate set of circum- 
stances — a remarkable cognitive achievement that we will set aside. Second, 
from a phonetic point of view, the child must find, in the acoustic structure 
of an adult word, information that will specify its own articulators' move- 
ments (of. Browman & Goldstein, ms.). Third, the child must -execute those 
movements. 

^ At the risk of laboring the obvious, let us roughly spell the process 
out. Suppose, for example, that a child utters [me], while reaching for a 
cup, and that an observing adult happily recognizes an^ attempt at [milk]. 
Evidently, the acoustic structure of the adult word specified at least the 
following gestures in a more or less precise temporal arrangement: (1) set 
larynx into vibration, (2) raise jaw and close lips, (3) lower Jaw and open 
lips, raise velum, (5) raise tongjje. Thus, the perceptual representat ipn 

that controls the child's movements must already have been "segmented" to the 
extent that it specified the actions of distinct and partially independent 
articulators. 

We may view these actions and their acoustic specifications as precursors 
of systematic phonetic features, if we wish. But we should not be misled 
thereby into assuming that the child classifies speech sounds perceptually 
according to invariant properties shared across contexts. Indeed, evidence 
for this capacity in infants is quite equivocal (for discussion, see Stud- 
dert-Kennedy, this volume). ^ 
• 

Consider, here, the .ideal case of a child's first word, or, perhaps, 
first imitation of an adult segmental sound pattern. If the event follows the 
model sketched above for [me], the child has no need to have "recognized'' that 
components of the acoustic iaformation belong to classes of components whose 
members occur in other ^contextg.. All that is required is that the acoustic 
information specify a pattern of articulator action -in this word. Thus, for 
the child, its first word (and indeed evQry word in its early repertoire) is 
phone'tically unlike every other word in almost every respect. This is trie 
implication, it seems to me, of the claim that the word is the unit of con- 
trast. 

To elaborate, let us take the syllables [dae] and [di], treating them, 
for present purposes, as items in a child's repertoire. The first syllable of 
the adult models may have had flat or falling, the second rising second and 
third formant transitions, a frequently cited example of a lack of invariance. 
However, on the present view, we need not suppose that the perceptual 
repjresentations controlling the syllable onsetq, when the child combines them 
to utter [daedij, are identical. Rather, 4f the child is tracking the ges- 
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tures in the speech it hears, it will find a slightly retracted alveolar con- 
tact followed by backward movement of the tongue, in the first syllable and a 
slightly fronted contact, followed by forward movement of the tongue, in the 
second, and so will produce just the so-called "coarticulaj^ed" pattern it has 
heard. As the range of\contexts in. which a child hears and produces alveolar 
closure and release widens, an auditory-art iculatory class may be formed. 
However, the class qua class initially has no function. Any particular 
instance of alveolar closure and release is perceived or produced as an 
idiosyncratic articulatory routine contributing to formation of the particular 
word to which it belongs. 

I will not si3eculate further on the processes by which recurrent articu- 
latory routines or gestures may crystallize into classes of control struc- 
tures, or phonemes, contrasting syst^ematically in terms of their defining fea- 
tures. These are matters^or the child phonologiat. But I have two brief 
disclaimers. 

First, the notions sketched above in no way caaet doubt on possible func- 
tions of features and phonemes in later language. The function of the 
phoneme, for example, as a control structure in speaking, is demonstrated by 
the fact that most normal children ban learn to consult their owd productions 
and to write* alphabetically (sometifties even before they can read). A system 
of behavioral notation (as in the alphabet, music, and dance) could only serve 
as a set of instructions to behave. If the instructions matched already exist- 
ing control structures. Just as the bicycle was a technological discovery of 
new behaviors implicit in the cyclical mode of human locomotion, so the alpha- 
bet must>' have been a discovery of new behaviors, reading and writing, implicit 
in the motor control of human speech. 

My second disclaimer is that the view taken here has any bearing on 
whether we may or may not be able to arrive at sat isfafctory descriptions of 
invariant classes in the articulatory and acoustic structures of speech. My 
intent is merely to raise the possibility that such invariailts would be simply 
descriptive, an outcome rather than a condition of development. Invariants, 
as. invariants, may have no neces3ary function for the child learning to Speak, 
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BRIEF COMMENTS ON INVARIANCK IN PHONETIC PERCEPTION* 



A. M. Libermant 



According to the instructions of my hosts, I have ten minutes to tell how 
I see the matter ^ invariance. So, getting right to the point, I should say 
that my concern is with invariance only in the conversion from sound to 
phonetic structure, then move immediately to the facts that such invariance 
ought, in my view, to take into account. 

Because of the way we speak, the acoustic information for a 
phonettc segment commonly comprises a large number and wide variety 
of cues, most of them dynamic in form. These cues span a consider- 
able stretch ot sound, grossly overlap the cues for other segments, 
and are subject to a considerable , amount of context-conditioned 
variation. 

The phonetic perceiving system is sensitive — one might say 
exquisitely sensitive— to ' all the acoustic cues. None of them is 
truly necessary; all are normally used; and their relative impor- 
tance bears little relation to their salience as it might be 
reckoned on a purely auditory basis. 

Perception of phonetic structure is immediate in the sense that 
there is no conscious mediation by, or translation from, an auditory 
base* This is to say, most generally, that listeners are only aware 
of the coherent phonetic atructure that the cues convey, not of the 
quite different auditory appearances the cues might be expected to 
have, given their overlap, '^context-conditioned variation, number, 
diversity, and dynamic nature. Thus, taking stop consonants and 
their dynamic formant-transition cues as a particular example, I 
note that listeners are not aw^e of the transitions as pitch glides 
(or chirps) and also as, (supljiort for) a stop consonant; listeners 
are only aware of the stop. Yet these same formant transitions are 
perceived as pitch glides (or chirps) when^ — on the nonspeech side of 
a duplex percept, for example — they do not figure in perception of a 
phonetic segment. ^ 

These facts have two implications relevant to our concern. One is that 
the invariance between sound and phonetic structure^ should be sought in a gen- 
eral relation between the two that is systematic but special, not in particu- 
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lar connections that are occasional and discrete. The reration we seek can bo 
seen to be systematic to the extent it is governed by lawful dependencies 
among articulatory movements, vocal-tract shapes, and 3ounds, dependencies 
that hold for all phoneticaliy relevant behavior, not Just for specific and 
fixed sets of elements. The relation has got to be special ©because the vocal 
tract and its organs are special structures that behave, most obviously in 
coarticulation, in special ways, A second implication is that the special re- 
lation between sound and phonetic structur^i is acted on in perception by a. 
system that Is appropriately specialised for the purpose. 

If the foregoing assumptions are correct, then the inva^riance in speech 
is not unique. Rather it resembles, at Teast grossly, the kinds of special 
invariances that are found in many perceptual domains. Accordingly , the sys- 
tem that is specialized for phonetic perception can be seen as one of a class 
of similarly specialized blologi'cal devices. All take advantage of a 
systematic but special Invariance between the "proximal" stimuli and some 
property of the "distal" object. The result Is Immediate perception of that 
which It is most Important to percel ve--namely , the properties that make It 
possible to identify the invariant distal object. 

Consider, as an example, visual perception of depth as determined by the 
proximal cue of binocular disparity. There Is a general and systematic, yet 
special, relation between the distal property (relative distance of points In 
space) and the proximal stimulus (disparity). The relation is general and 
systematic in that it Is governed by the laws of optical geometry and holds 
for all points (within its range) and for alJL objects, not Just for some. The 
relation Is special because It depends on the special circumstances that we 
have two eyes, that they are so positioned (and controlled) as to be able to 
see the same object, and that they are separated by a particular distance, 
Neurobiologlcal Investigation has revealed an anatomically and physiologically 
coherent system — a biological "module," if you will — that Is specialized to 
process the proximal disparity and relalffe it to the distal depth. Given that 
specialization, perception of depth is automatic and immediate: there is no 
conscious mediation by, or translation from, the double Images we would see 
If, In fact, we were perceiving the proximal disparity as well as the distal 
property it specifies. 

Other perceptual phenorpena have the same general "characteristics. Audi- 
tory localization and the various constancies come immediately to mind, and, 
if we put aside questions about phenomenal "immediacy," so too do such proces- 
ses as those that underlie echolocatlon in bats and song In birds. These are 
surely specializations If only because each such process, or module. Is as 
different from every other as Is the Invariant relation it serves. The 
phonetic module differs from many of the others in at least twb ways. 

To make one of the differences clear I would turn again to binocular 
disparity and depth perception as representative of a large class. In this 
case the distal object i is "out there," a physical thing In the narrow sense of 
physical, and the invariant relation i^etween Its properties and those of the 
proximal stimulus Is determined, as already indicated, by optical , geometry and 
the separation of the eyes. In speech, however, the- distal object — a phonetic 
structure — is a physiological thing, a neural process in the talker's brain, 
and the invariant relation between its properties and those of the proximal 
sound is determined in large part by neuromuscular processes internal, to the 
talker but available alsp to the listener. Thus, the specialized phonetic 
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module .might be expected to incorporate a biologically based link between 
production and perception. Such a link is not part of the disparity module or 
or the other perceiving modules it exemplifies, though it may very well char- 
acterize the "song center" module of certain birds. 

A second important difference in the nature of the invariance (and its 
module) has to do with the question: What turns the module on? In the case 
of binocular disparity, the answer is a quite specific characteristic of the 
proximal stimulus—namely, disparity. Notice, however, that disparity has no 
other utility for the perceiver but to provide information about the distal 
property, depth. There are, accordingly, no circumstances in which the 
perceiver could use the proximal disparity as a specification of, or signal 
for, some other property. This is to say that disparity and the depth it 
conveys do not compete with other aspects^ of visual perception such as hue, 
form, etc., but rather complement them. Not so in phonetic perception. There 
is,' first of all, the fact that the speech frequencies overlap those of non- 
speech. More to the point, the fprmant transitions that we don't want to 'per- 
ceive as chirps when we are listening to speech are very similar to stimuli 
that we do want to perceive as* chirps when we are listening to birds. Thus, 
almost any single aspect of the proximal stimuli can be used for perception of 
radically different distal objects: phonetic structures in a talker's head or 
acoijstic events and objects in the outside world. What follows is that' the 
module can hardly be turned on by some specific (acoustic) property of the 
prdbcimal stimulus. Not surprisingly, then, -we find in research on speech 
perception that the module is, in fact, not turned on that way, but rather by 
some more global property of the sound. Thus, Just as in the perception of 
phonetic segments all cues are responded to but none is necessary, so too in 
identifying sound as speech. 

How, then, is the module turned on? What invariant property of the sound 
causes the listener to perceive that 'the distal, object is a phonetic structure 
and not some nonlinguistic object or event? I offer a suggestion. Suppose 
that auditory stimuli go everywhere in the nervous system that auditory stimu- 
li can go, including, of course, the language center. Suppose, further, that 
the language center applies the principle: if the shoe fits, wear it. What 
is decided, then, by the language center is the answer to the question: could 
these sounds, taken quite abstractly, have been produced by linguistically 
significant articulatory maneuvers, also taken quite abstractly? If the an- 
swer is yes, then the module takes over the purely phonetic aspects of the 
percept, and the auditory appearances are inhibited. (Auditory aspects that 
ar>e irrelevant to the phonetic, such as loudness or hoarseness, are perceived, 
^ course, as attributes of the same distal object.) If the answer is no, 
then the phonetic module shuts down and the ordinary auditory appearances of 
the stimuli are perceived. Hence the common experience of those who work with 
synthetic speech that when the sound includes configurations that the articu- 
latory organs cannot produce, as well as those it can, the percept breaks, 
correspondingly, into nonspeech and speech. Phenomenally, the nonspeech 
stands entirely apart from, and bears no apparent relation to, the speech, 
even though the acoustic bases for these wholly distinct percepts were 
perfectly continuous. The same arrangement for turning the module on (or off) 
might account for the fact that certain kinds of acoustic patterns--f or exam^ 
pie, sine waves in place of f ormants-^-can be perceived as speech or as non- 
speech -depending on circumstances that in no way alter the acoustic .structure 
of the stimulus. It also helps to explain how, as in the unnatural procedures 
of duplex perception, we can disable the meohanism that forces the choice be- 
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tween speech and nonspeech, and 30 create a situation in which exactly the 
same proximal /^ormanj; transition is simultaneously perceived (in the same con-- 
text and by the same brain) as critical support for iw-stop consonant and also 
as a nonspeech chirp. At all ©vents, there is a kind of competition between 
phonetic perception and other ways of perceiving sound, A consequence is that 
the phonetic^ module jproducep a more or less distinct mode of perception in a 
way that modules like depth perception do not. This phonetic mode 
accommodates a class of disA^al objects that are distinguished, not only by 
their role in language, but also by the special nature of the invariant rela- 
tion by which they are connected to sound. 
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PHONETIC CATEGORY BOUNDARIES ARE FLEXIBLfi* 
Bruno H. Repp and Alvin M. Libermant 



' ^ Introduction 

In the grammatical domains of language we find no gradients, only cate- 
gories. Thus, gradations of, for example, tense (present past), form class 
(noun verb), or eVen word (night - day) are everywhere absent. Indeed, they 
are impossible, for syntactic, morphologic, and phonologic devices do not per- 
mit of continuous variation. At the surface of language, however, the s'itua- 
tion.^ is different. ' There, in the relation between phonetic gtructure and 
sound, the role of the segments is categorical — a segment is, for example, [d] 
or [g], not something in between^-but the sound can vary continuously. That 
being 30, at least in synthetic speech, we can ask whether the phonetic seg- 
ments are categorical, not only in their linguistic function, but also in the 
way they are perceived. The, answer is a qualified "yes." Other things equal, 
stimuli biBlonging to the same phonetic category are mor^ difficult to' 
discriminate than stimuli on opposite sides of a phonetic boundary. This phe- 
nomenon has long been known as "categorical perception" (Studdert-Kennedy, 
Liberman, Harris, ,& Cooper, 1970).^ The research it has generated, which was 
recently reviewed by one of us (Repp, 198^), is^ largely concerned with the 
ability of listeners to detect stimulus differences within the categor- 
ies — that is, with the degree to which perception is perfectly categori- 
cal—and with the conditions under which that ability can b^ made to vary. 
Our concern in this chapter is rather with the conditions under which the lo - 
cations of the categories on a continuum can be shown to vary, and with the 
implications of that variation for a theory about the nature of^ the categor- 
ies. More particularly, we will be concerned with the boundaries between the 
categories (and with their movement), so before considering the relevance to 
theory, we should justify our concern with the boundaries. 

We take the boundary to ^e the point along the appropriate (acoustic) 
stimulus continuum at which subjects classify stimuli into alternative cate- 
gories with equal probability. In the typical case of two (adjacent) categor- 
ies, this is simply the point corresponding to the 50-percent cross-over of 
the response function. If more than one stimulus dimension is varied, cate- 
gory boundaries may be represented by contours in a multidimensional space 
(see, e.g., Oden & Massaro, 1978). The standard^method of obtaining category 
boundaries is to present a set of stimuli repeatedly (and in random order) for 
identification as members of one class or another. Several alternative meth- 
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od3 — for example, a method of adjustment— have been used, but all yield simi- 
lar boundaries (Oanong & Zatorre', 1980)» ' • 

Why do we take account^" only of the boundaries? After all, It is the cat- 
egories' themselves, rather than the boundaries between them, that play the im- 
portant role in speech communication. Why not, then, deal with some appropri- 
ate exemplar— the prototype, as it were — of the category? A sufficient reason 
is 'that, until recently, no one had used methods designed to identify the 
prototypes. Worse yet, the application of such methods has so far not yielded 
entirely satisfactory results (Samuel, 1979, 1982). The measurement of bound' 
aries, on the other hand, has long been common in research on speech; so the 
data are plentiful. Moreover, the boundaries do inform us about the categor- 
ies and, under some specifiable conditions, about their positions on the ap- 
propriate acoustic contlnua. And, finally, as we will say below, it is the 
boundaries, not the prototypes, that. are central to the assumptions underlying 
at least one of the important theories about the categories. 

Still, it is importatit to keep in mind that the location of a category 
boundary is determined, not only by the listeners* internal representations 
(the prototypes) of the categories, but also by the criterion %hey adopt for 
decitling between tvro competing categories, which makes the boundary vulnerable 
to- biasing influences of various kinds. In principle, at least, a change In 
the location of a boundary may result either from a change in one or the other 

(or both) of the category prototypes, or fr6m a criterion shift. 

■'I 

It is important tp know whether, and under what conditions, the bound- 
aries between phonetic categories are flexible, because the question bears on 
two very different hypotheses about the pr^oces^es that underlie the categori- 
zation. According to one hypothelSis, the perceived categories result *from 
psychophysical discontinivities that directly reflect the characteristics of 
the auditory system. Thus, given an acoustic stimulus continuum appropriate 
for some phoneti(5 distinction,'^ a* category boundary is assumed to fall natural- 
ly at a point on ' the continuum where, ow|,ng to the way the ear works, 
differential sensitivl^ty undergoes a sudden change. Perhaps the most general 
implication of ^this^ hypothesis is" that auditory categories are the stuff of 
which phonetic categories are made. Put another way, the implication is that 
articulatory gestures are so goverrted as to produce sounds that fit within the 
categories that the auditory system happens to provide. Accordingly, we will 
refer to this as the "auditory" hypothesis. By any name, it is the hypothe- 
sis, referred to earlier, that deals directly .with the boundaries of the cate- 
gories 'father than their ideal exemplars or prototypes. Aa for movement of 
category boundaries, that is allowed urlder this hypothesis, but only as a re- 
sult of psychoacoustic factors that apply to auditory perception in general, 
and only to the exterit that such factors can actually modify the patterns of 
differential sensitivity on which the auditory boundaries rest. 

The other hypothesis is that the boundaries are . determined by category 
prototypes that reflect typical productions of the relevant speech segments. 
Accordingly, the prototypes and the boundaries between them need not conform 
to discontinuities in the auditiory system, but are, instead, free to be 
precisely as flexil?le as the acoustic consequences of the articulatory ges- 
tures require. In fact, considerable flexibility may be demanded. The effi- 
•oiency of phonetic communication de))ends crucially on the ability of the, sev- 
eral articulators to produce successive phonetic segments at the same time (or 
with considerable overlap), and also to accommodate in other ways to changes 
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in phonetic context and rate. These maneuvers can produce systematic changes 
in the way a particular phonetic segment is represented in the sound, if the 
perceiving apparatus were not flexibly responsive to those changes, communica-- 
tion would break down, or so it seems. Moreover, the inventory of phones will 
itself change as language changes, and ^this, too, requires flexibility in the 
prototypes. Our hypothesis ,is that a link between perception and production 
(in most general terms) enables the category prototypes to respond appropri- 
ately t^ articulatory or co--articulatory adjustments, and so to minror the 
talker 'sVphonetic intent. Needing a convenient r\ame to refer to this hypothe- 
sis, and wishing to distinguish it from the "auditory" hypothesis we described 
first, we will call it "phonetic."* . . ' ^ 

Our aim in this chapter is to bring together the many data that 
demonstrate flexibility of a kind the phonetic hypothesis leads us to expect. 
These pertain ta the influences on perceived phonetic boundaries of such fac- 
tors as phonetic context, speaking rate, the mix of acoustic 'cues, and 
linguistic experience. But there are other effects. on the perceived bound- 
arie^fi^ibout which the auditory and phonetic theories are neutral. These in- 
clude -the consequences of varying the range, frequency, and order of the sti- 
muli, as well as such phenomena as contrast and adaptation. Since effects of 
that kind need to be distinguished from those that are more directly relevant 
to the auditory and phonetlT) theories, we will consider them first. We will 
note, however, that even these "simple" effects sometimes follow patterns that 
seem difficultr to reconcile with a pur<ely auditory theory, and that suggest 
that speech-specific perceptual criteria may play a role in certain situa- 
tions. Our review will be selective and focus especially on these instances. 

Stimulus Sequence Effects 

Under this heading we consider influences on the perception of speech 
stimuli exerted by other, similar stimuli preceding or following them in a se- 
quence. These effects need to be distinguished from the "stimulus structure 
effects" discussed later, which concern perceptual dependencies within a sin- 
gle coherent speech stimulus or influences entirely due to factors within the 
listener.^ 

It is generally agreed that vowel Identification — of isolated 
steady-state vowels, at least — is highly susceptible to all sorts of stimulus 
sequence effects. On the other hand, the identification of consonants, and ot 
stop consonants in particular, is more stable and less sensitive to stimulus 
context. This difference parallels the well-known difference between these 
two stimulus classes in the extent of "categorical perception"; indeed, the 
criterion of "absoluteness" (i.e.. Independence of surrounding stimuli ) 
constituted part of the classical definition ^of categorical perception (Stud- 
dert-Kennedy et al,, 1970). "Context sens^itivity" in a sequence may be dis- 
tinguished on logical grounds, however, from the extent of the subject's reli- 
ance on category labels in discriminating between "stimuli (Lane, 1965; Repp, 
Healy, & Crowder, 1979), a^nd these two aspects, of categorical perception can, 
to some extent, be dissociated experimentally (Healy & Repp, 1982). 

Local Sequential Eff e ct s 

Local sequential effects — typically, influences of a preceding stimulus 
-on the identification of a following stimulus — may occur in any random test 
sequence. These effects, are per^va^ive in absolute identification, magnitude 
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estimation,' and other psychophysical tasks involving nonspeech stimuli. 
Surprisingly, there have been very few attempts to determine the extent of 
sequential effects in standard speech identification tests, whrnj-e stimuli are 
presented in random order. Of course, there is an indirect test in the shape 
of the labeling function, since it can be steep only if sequential effects are 
relatively small. 

^ In several studies of speech-sound identification, however, the stimuli 
have been presented in balanced arrangements specifically designed for the 
assessment of sequential context effects. In one of the earliest of these 
studies, Eimas (1963) called- for identification of stimuli presented in ABX. 
triads of the sort often used in discrimination tasks, and found large context-^ 
effects for isolated vowels (see also Fry, Abramson, Eimas, & Liberman, 1962) 
and smaller, but by* no means .negligible, effects for both the voicing and 
place dimensions of stop consonants. All effects were contrastive — that is, a 
stimulus tended to be classified into a category different from that of the 
stimulus it was paired with-Tand the magnitude of the effect increased with 
the acoustic distance between adjacent stimuli. Comparable results have been 
obtained more recently by, among others, Healy and Repp (1982). ^ 

Although sequential effects are generally considered to be common to 
speech and nonspeech stimuli, there are some . intriguing differences. For 
example. It has. been lound in several studies that the magnitude of the con- 
trast effect is greater for contlnua of isolated vowels than for nonspeech 
continua such as pitch 9r duration (Eimas, 1963; Fujisakl & Shigeno, 1979; 
"Healy & Repp, 1982;. Shlgeno & Fujisakl, 1980). While it is possible that the 
difference is to be accounted for by the more complex acoustic (and auditory) 
nature of the vov^els (and there are also problems with comparing the magni- 
tudes of contrast effects across different stimulus continua), it may, with 
equal plai^slbillty, be taken to reflect a flexibility of categorization pecu- 
liar to the class of vowel sounds, a class that happens to carr^y the major 
burden of dialectal variation and language change. 

If two or more stimuli in a sequence must be held in memory before a re- 
sponse is permitted, as in the procedure of Eimas (1963) described above, the 
effects of the* stimuli on ^h other are retroactive as weH as proactive. 
Interestingly, retroactive effects tend to be larger than proactive effects 
for Isolated- vowels, while the opposite tends to be the case for all other 
types of stimuli examined, whether speech or nonspeech (Dlehl, Elman, &• 
McCusker, 1978; Healy & Repp, 1982; Shlgeno & Fujisakl, 1980). This find- 
ing, like the one having to do with the magnitude of contrast may be explica- 
ble by acoustic stimulus properties alone, or it may reflect a specific tend- 
ency, derived perhaps from experience with fluent speech, to revise tentative 
decisions about vowel categories in the light of later information. 

One reason we consider that even simple sequential effects may exhibit 
speedh-specif ic patterns is that these effects almost certainly take place in 
two quite distinct ways, one reflecting a sensory effect, the other a Judgmen- 
tal effect (see Simon & Studdert-Kennedy, 1978). 'That is, there may be an ef- 
fect of a preceding stimulus on the sensory representation of a following 
stimulus (as well as the reverse, if both are held in a precategorical memory 
store),, but the Judgment of a stimulus may also be affected by the response 
that was assigned to the preceding or following stimulus, usually in a 
contrastive fashion. Whereas the purely sensory effects are presumably shared 
by speech and nonspeech stimuli and are sensitive to factors such as spectral 
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similarity and temporal proximity (Crowder, 1981, 1^82), the apeclal structure 
and function of phonetic categories may produce criterion shifts in the re- 
sponse domain that are specific to speech. Although a clear ^separation of 
3timulus and response' effects has rarely been achieved in speech experiments,' 
separate studies* provide evidence for each type* Thus, Crowder (1982) has 
shown that proactive contrast effects for isolated vowels decrease with tempo- 
ral separation over about 3 s,in a manner thgit parallels the decay of auditory 
sensory storage in other paradigms. On the other hand, Sawusch and Jusozyk 
(1981) found that sequential contrast depended more on the penceived category 
of the preceding stimulus than on its acoustic structure. Judgmental effects 
may depend in part on whether or not a response to the contextual stimulus is 
required: A comparison- of Crowder 'a, (1982) data with those of Repp et 
al. (1 979 ) for isolated vowels Suggests that proactive contrast effects are 
reduced whefi only the second stimulus in a pair requires a response. (It goes 
almost without saying that retroactive contrast effects would be reduced or 
eliminated if only the first stimulus in a pair were responded to,) 

The distinction between sensory and judgmental components of sequential 
effects is also familiar in nonspeech psychophysics (e.g., Petzold, 1981) and 
is' compatible with Braida and Durlach's (1972) two-factor theory of perceptual 
coding (see Maomillan's chapter, this volume). Thus-, Petzold (1981) has found 
that preceding stimuli exert a contrastive effect while preceding response^ 
exert an assimilative effect. On the other hand, Shigeno and Fujisaki (1980*)" 
have proposed a two-factor model for sequential effects in^p^ech and non^ 
speech that predicts precisely the Opposite. The limited data available sug- 
gest, on the contrary, that for speech both components of sequential effects 
are contrastive in natui^e. 

Global Sequential ( Range-Frequency ) Effects 

Shifts in phonetic category boundaries may occur as a consequence of 
variations in ^the overall composition of a stimulus sequence — that is, the 
range of stimuli employed and the frequency of occurrence of the individual 
stimuli. In general, if the stiravilua range is shifted or expanded in a ob- 
tain direction, the boundary will shift in the same direction; and if one 
stimulus (typically one of the endpoints, the "anchor") occurs more frequently 
than other stimuli, tfhe boundary will shift toward it. In other words, the 
effects are contrastive In nature, and, in the case of speech sounds, they ex-- 
hlbit variations In magnitude similar to those observed for simple sequential 
effects: For stop consonants varying'^ in place or voicing, the effects are 
small (Brady &'^Darwln, 1978; Rosen, 1979 )/ while for isolated vowels (Sawusch 
& Nusbaum, 1979), certain other consonantal contrasts (Repp, 1980), and even 
for stop consonants in Polish ^Keating, Jlikoa^ & Ganong, 1981), they may be 
quite large. • ' . ^ . . " ^ ' ' 

An interesting asymmetry has been observed in the anchoring paradigm for 
isolated vowels (Sawusch, Nusbaum, & Scjhwab, 1980): An^^analysis of anchoring 
effects on an /i/-/I/ continuum suggested that the effect of the /i/ anchor 
was due to sensory adaptation while that, of the /I/ anchor represented a 
change in response criterion. In a recent and similar studyV in which the an- 
chor always came first in a stimulus pair and only the" second stimulus* rp- 
quired a re3p6nse, Crowder and Repp (1984) found an effect of /i/ but .not of 
/I/. The explanation for this asymmetry may be found in the acoustics of the 
stimuli; alternatively, it may be owing to the special status of /i/ as one 
of the corners of the*" vOwel apace. \ : 
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We should note, perl^^, that although range-frequency effects are usual- 
ly .considered to derivSWrom stimuj.us * context beyond the immediate local 
environment, they are ofteti confounded with sequential probabilities: If a 
given endpoint stimulus (the' anchor) occurs more of<ren than other stimuli, the 
probability that a giveo stimulus is immediately preceded by the anchor will 
be increased relative to an equal-frequency (or a different anchoring) condi- 
tion. Similarly, if the stimulus range is shifted" or expanded in one direc- 
tion, the likelihood that certain' critical stimuli are preceded by other sti- 
muli from that part of the continuum is increased. Therefore, range-frequency 
effects may in many cases be just' local sequential effects in disguise. The 
extent to which nonlocal stimulus context makes any additional contribution 
has, to our knowledge, not been ascertained experimentally for speech stimuli, 
.It is possible, however, that the frequent occurrence of a single stimulus has 
an additional adapting influence not evident in regular balanced stimulus se- 
quences. In that sense, the anchoring paradigm approximates the selective 
adaptation paradigm, to be discussed next. 

Selective Adaptation 

In selective adaptation experiments, an adapting stimulus (frequently one 
or the other endpoint stimulus of a speech continuum) is presented repeatedly 
many times before responses to a few test stimuli are collected. The original 
motivation for using this paradigm in speech research was the assumption that 
thev effects of the. adapting stimulus might reveal the epcistence and nature of 
"phonetic feature detectors" (Eimas & Corbit, 1973; see Remez's chapter, this 
volume). Apart from the difficulty of conceiving that phonetic features 
(^•ft. f place, manner, voicing)"* could possibly be perceived by detectors that 
respond to such simple features as the auditory analogs of edges and -angles in 
vision (see, e,g,, Diehl, 1981; Studdert-Kennedy, 1981; Rerttez, this volume), 
a large number of experiments suggest that the effect of selective adaptation 
take place primarily at the auditory, not the phonetic (judgmental) level, 
(Hftwever, see Elman, 1979.) 

The most striking demonstrations of the auditory (as opposed to the 
phonetic) nature of selective adaptation were provided in two recent studies. 
In one of these, Roberts ?ind Sumraerfield (1981) presented audiovisual adapting 
stimuli that, due to the overriding influence of a conflicting v^isual display, 
were never classified into the category normally associ'ated with the auditory 
stimulus. Nevertheless, the audiovisual adaptors had exactly the same influ- 
^nce on the identification of auditory test- stimuli as did purely auditory 
adaptors. Thus, th^ phonetic category assigned to the adaptors seemed to play 
no role in selective adaptation, A similar result was obtained by Sawusch and 
Jusczyk (1981), who used adaptors of the form /spa/, in which the stop conso- 
nant was phonetically classified as "p" but acoustically identical with the 
Initial "b" in /ba/. The adapting effects of /spa/ and /ba/ did not differ,' 
Theoe studies, together with several earlier attempts to dissociate acoustic 
and phonetic stimulus properties (Blum^tein, Stevens, & Nigro, 1977; Sawusch 
& Pisoni , 1 976 ) 1 suggest that selective Adaptation with speech is an 
exclusively auditory phenomenon. Even though studies of interaural transfer 
of adaptation effects suggest more than one site at which adaptation takes 
place (Gandng, 1978; Sawusdh, 1977)f both of these sites appear to be audi- • 
tory (l,e,, nonphonetic) in nature, 

• ■ i ^ ■ ' 



Repp & Liberman: Phonetic Category Boundaries Are Flexible 

There are two types of evidence, however, that do indicate some Involve- 
ment of phonetic processing in selective adaptatibn^ One has to do with the 
influence of the listeners' native language. . Yhe relevant finding is that 
selective adaptation effects on the same stimulus continuum are different for 
American and for Thai listeners, as independently demonstrated by Donald 
(1976) and Foreit (1977). The continuum was one bf" stop consonants varying in 
voice onset tidie (VOT), ranging from prevoiced (voicing lead) to devoiced (0 
"ms VDTT to aspirated "(voic^^ lag). For American listeners, who do not 
distinguish prevoiced and devoiced stops, a -60 ms VOT and a 0 ms VOT adaptor 
had the. same effect on the category boundary. For Thai listeners, on the oth- 
er hand, who have three separate categories on the .continuum, only the 0 ms 
adaptor affected the devoiced-aspirated boundary while the -60 ms adaptor was 
ineffective. This finding agrees with earlier results of Cooper (iS?,^!) show- 
ing that, on a place-of-articulation continuum divided into three citjegories, 
adapting stimuli affected only the adjacent but not the remote categfc bound- 
ary. ' r 

The other piece of evidence for a role of phonetic categoriaj^etion in 
selective ada^ptation comes from studies that have revealed differences in the 
effectiveness of adaptors, as a function of their distance from the category 
boundary. In general, the effectiveness of an adaptor increases with its dis- 
tance from the boundary (Ainsworth, 1977; Cole & Cooper, 1977; Miller; 
1977a) ,' unless it crosses another phonetic boundary (Cooper, 197^1; Donald, 
1976; Foreit, 1977). "Of course, this may be just another instance of the 
well-confirmed fact -that the spectral similarity of adaptor and test stimuli 
is the major determinant of the size of the adaptation effedfe In other 
words, the distance effect may hjave a purely auditory explanatj|i. In a re- 
cent study, however. Miller et Si. (1983) demonstrated that, even if no other 
phonetic boundary intervenes, the adaptation effect does not increase 
indefinitely as the adaptor moves away from the bounddry, but ihstead reaches 
a maximum and then declines (or, for some subjects, remains on a plateau). 
The adaptor that prodllces the maximum effect has characteristics that may rea- 
sonably be assumed to be optimal for its category, which led Miller et al„ to 
conjecture that the size of the adaptation effect is related to the adaptor's 
distance from the listener's internal category prototype. Preliminary support 
for this hypothesis was obtained by Miller et al.^in a condition in which the^ 
category boundary on a /ba/-/wa/ continuum, and with it the presumable loca- 
tion of the /wa/ prototype (cf. Miller & Baer, 1983), was made to shift by 
reducing the duration of the syllables. The peak in the function relating the 
size of the adaptation effect to the location of the adaptor on the continuum 
shifted accordingly, as predicted. 

, Even stronger support for a role of "category goodness" in selective 
adaptation comes from a st^udy by Samuel (1982). He first asked his subjects 
to locate the optimal /ga/ on a /ga/-/ka/ VOT continuum. The subjects were 
then divided into two groups — those with short-VOT and those with long-VOT 
prototypes. Two adapting stimuli matching the two Average prototypes were 
then selected. For each group of subjects, the adaptor matching the group's 
prototype produced the larger boundary shift. Since exactly the same adaptors 
were used for both groups, the listeners' internal category prototype seemed 
to be responsible for the magnitude of the adaptation obtained. 

These recent results lead to the. tentative conclusion that selective^ 
adaptation takes place at an auditory level that is phonetically relevant. 
Perhaps this should not come as .a surprise. The adapting stimuli, after all. 
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are speech and therefore are phonetically relevant auditory patterns. 
Conversely, ,the internal standards or categof^y prototypes against whicfi 
listeners presumably compare stimuli in the process of categorization mugt en^ 
tail detailed auditory specifications; otherwJ.se, in the absence of a comraori 
metric, the comparison would be impossible*! Selective adaptation may then b6 
viewed as a temporary modification of the prototype itself---a weakening of the 
criterial specifications that is proportional to the degree to which the audi- 
tory input meets those .specif Icatlons, With this interpretation, the results 
reviewed above can be reconciled with the numerous earlier demonstrations of 
"purely auditory" effects in selective adaptation. 

From this vantage pfoint, the various "low-level" effects reviewed so 
fac — sequential contrast, 'range-frequency effects, and s.elective adapta- 
^ tion— are relevant to the topic of our paper, the flexibility of phohetlc 
boundaries. In essence, the data seem to show that not even a psychophysical 
procedure like selective Adaptation has its effects exclusively at a "general 
auditoi^y level" of processing; rather," as long as the adapting stimuli are 
speech, their effects reflect the extent to which they engage the speech proc- 
essing apparatus. Since speech stimuli ordinarily engage the mechanisms of 
phonetic categorization (even in the absence of an overt or covert response), 
selective adaptation with speech is properly viewed as a speech-specific phe- 
nomenon — a modification of the frame of reference within which speech stimuli 
are interpreted. The same is true for range-frequency and sequential contrast 
effects, except that overt responses to contextual stimuli may have additional 
effects at a Judgmental level. In other words, although speech must pass 
through the auditory nerve, there may be no "general auditory" level of 
representation beyond the peripheral transduction. Speech perception takes 
place within a pre-established frame of reference, and the auditory represen- 
tati*^ of speech cannot be separated from the (equally "auditory") internal 
struc^res, due to cumulative experience in conjunction with biological 
predispOT^i^ions, through which the incoming information is filtered. 

Stimulus Structure Effects 

Under this heading we consider perceptual dependencies that arise among 
different compqnents of a single coherent speech stimulus. That stimulus may 
be as short as a single syllable or as long as a 'whole sentence. Stimulus 
structure effects, even though they are most easily revealed in the laborato- 
ry, are closer to the real life situation than the stimulus sequence effects 
discussed in the preceding section, which represent or exploit artifacts of 
test sequence construction. Although the experimental induction of selective 
adaptation or sequential contrast may be useful for the purpose of probing 
perceptual mechftnlsms, there Is no reason to believe that these phenomena (aS 
distinct from the mechanisms they reveal) play any significant role in the 
perception of coherent^speech. Thef various effects discussed in the present 
section, "on the other hand, have more direct implications for normal speech 
perception, as they reflect the perceptual functions of integration and 
normalization that make speech perception so effortless and efficient.** 

Cue Integration Effects 

It is Veil known that distinctions among phonetic segments rest on a 
multiplicity of acoustic cues in the speech signal. Typically, these many 
cues are acoustically diverse, relatively widely distributed in time, and 
overlapped with cues for other segments. Yet the perceiver somehow integrates 
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these (WvQrse and. distributed aspects of the speech ^signal to recover the 
phonetic structure of the message (Liberman .& Studdert-Kennedy, 1978; Repp, 
Llberman. Eccardt, & P^setsky, 1978). Exactly how the Individual acoustic 
cues are characterized depends to some extent on the methods of analysis and 
experimental manipulation and on the descriptive framework chosen by the 
investigator. From a purely acoustic point of view, however, they seem in 
most cases to be incoherent. From an articulatory point of view, on the other 
hand, they make sense—that is, they reflect a unitary event in the domain of 
articulatory planning.^ 

The statement that there are multiple cues for each phonetic contrast 
must be qualified by the fact that some cues are more important than others. 
That is, some cues are easily overridden by others. Listeners' sensitivity to 
the weaker cues can be demonstrated in the laboratory by eliminating the 
stronger ones or by setting them at ambiguous values. From the existing evi- 
dence it can indeed be concluded that, given the opportunity, listeners will 
make use of anjr cue for a given phonetic distinction (Bailey & Summerfield, 
1980). This general observation suggests that, as Bailey and Summerfield 
(1980) have pointed out, the concept of cue has limited theoretical relevance. 
As a practical matter it is useful, even essential, in dealing with the acous- 
tic basis of speech perception. ,But the sensitivi ty ^ to the many and various 
cues for a phonetic segment suggests, as we have already implied, that 
listeners are perceiving Just what all the cues have in common — viz., some 
economical representation of the coherent process underlying ^the peripheral 
articulation. 

The relevance of cue integration to the topic of our chapter is evident * 
when- we consider that a phonetic category boundary is usually determined on a 
continuum of stimuli varying in only one important cue dimension. The flexi- 
bility of that phonetic boundary may then be assessed by introducing other, 
usually less important, cues that favor either one or the other response il- 
.ternative. .That boundaries are indeed flexible in this particular sense has 
been demonstrated in numerous studies. (For a recent review, see Repp, 1982.) 
By definition, phonetic boundaries are located at the point of maximal 
ambiguity, where weaker cues have their strongest effect. The perceptual cue 
integration, or phonetic "trading relation," revealed by the boundary shift 
generally takes place without the listener's awareness. Perception tends to 
remain categorical even in the presence of multiple acoustic differences among 
stimuli (see, e:g., -Fitch, Halwes, Erickson, & Liberman, 1980.) 

The ubiquity of trading relations a/nong acoustically diverse cues pro- 
vides one of the strongest arguments against theories that predict fixed 
boundary locations on any acoustic speech conti\iuum. In many cases, cues are 
so disparate as to be extremely unlikely to enga'^? in any direct psychoacous-- 
tic interaction. Rather, what seems to unite them is that they are common 
consequences of the articulatory gestures that differentiate phonetic seg- 
ments; at the same time, they are members of the set of structural acoustic 
differences that characterize a particular phonetic cpntrast. To cite only 
one specific example: The primar^f^ j3ue for the /s/-/// distinction is the 
spectrum of the fricative noJLse, but a secondary cue is provided by the voiced 
foripant transitions following the noise. The phonetic boundary on an /s/-/// 
continuum, obtained by varying the spectral properties of the fricative noise. 
Is at different locations depending on whether the formant transitions are ap- 
propriate ,for /s/ or for /// (Mann & Repp, 1980). Considering that the » 
fricative noise is of relatively long duration, produced by a different 
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source, and of a spectral composition quite different from that of the follow- 
ing signal, there is little reason to e)fpect any direct; effect of the formant 
transitions on the auditory representation of the fricative noise. Indeed, 
when listeners are led to focus on the "pitch" of the fricative noise (rather 
than on the -phonetic fricative category), there seems to be no influence of 
the following forma/it transitions on their judgments (Repp, 1981). Thus, the 
perceptual integration of the cues provided by fricative noise spectrum and 
formant transitions seems to be phonetically motivated and related ^tb the fact 
that different values of both cues are consistently correlated with" different 
places of fricative production. Similar arguments may be applied to other 
phonetic trading relations, even including those that could, in principle, re- 
sult from some psychoacoustic interaction. 

Feature Integration Effects 

The trading relations discussed' in the preceding section (and reviewed by 
Repp, 1982) take place among cues to a single phonetic feature — e.g., voicing 
or place of articulation. This is a consequence of ^^^^ fact that the phonetic 
categories constituting the endpoints of a speech tibntinuum nearly always dif- 
fer only in a single feature. Here we corjsider a related class qf effects 
that reveajls perceptual dependencies arapng cues to. different features of the 
same phonetic segment. The main ""'reason for considering these effects 
separately iaothat they give |;Jie theorist an additional degree of freedom: 
Feature interactions may hypothesized to occur after a process of "feature 
extraction" but bef ore^sisiembly of the features into a phonetic segment (see, 
e.g.. Miller, 1 977i>i Sawusch & Pisoni, 1974)\ For theorists who instead 
postulate eittiier direct psycl^oacoustic interactions among the cues or refer- 
ence to jgjiorteme- *or syllable-sized prototypes, the effects considered here are 
furthter instances of cue integration (cf. Oden & Massaro, 1978). 

The literature on genuine feature integration effects is rather small, 
for it is difficult to vary cues for different features in' a strictly orthogo- 
hal fashion. A well-known finding is that the voicing boundary on a VOT con- 
tinuum is at increasingly larger voicing lags for labial, alveolar, and velar 
stop consonants (Lisker & Abramson, 1970). In 'most -studies, however, the 
duration of the f irst-formant transition, which itself cortstitutes ^ voicing 
cue (as well ^as a weak cue for place of Articulation) covaried with place of 
articulation, so that the boundary shifts may be considered as being due to a 
simple trading relation among voicing cues. In one experiment, however, the 
F1 transition was held constant (with only the F2 and F3 transitions varying 
to cue differences in place of articulation), and a small but reliable voicing 
boundary shift as a function of place of articulation was obtained (Miller, 
1977b). (See, however, Massaro & Oden, 1980, for a failure. to replicate this 
result.) Subsequently, Miller. (1977b) showed that the boundary on a labial-al- 
veolar place of articulation continuum shifted depending on whether 1>he stop 
consonants weris synthesized as nasal, voiced, or voiceless. She interpreted 
t^heae results as revealing process! nij dependencies among phonetic features. 

tp alternative interpretation has been proposed in a model that builds feature 
ependencies into prespecified criterial feature values and so avoids ,,any 
processing interactions afte^r the feature extraction stage (Massaro & Oden, 
1980; Oden & Massaro, 1978). Because of the built-in dependencies, however, 
the model rests on the assumption of phoneme-* or syllable-siz^ prototypes and 
merely pays lip service to phonetic features. 
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Feature interactions of the kind observed by Miller f 1977b) presumably 
reflect the inherent noqorthogonali ty of articulatory features and their 
acoustic correlates. Clearly, the binary feature matrix devised by phonolo- 
gists is inadequate from a phonetic viewpoint. Initial velar stops, for exam- 
ple, because of their longer VOTs, simply are relatively "more voiceless'^ than 
labial stdps. The possibility of psychoacoustic ir\teractions among signal 
components must be considered, but there is no well-supported psychoacoustic 
explanation fop^he observed feature interactions. 

One case in which a psychoacoustic interaction between feature dimensions 
can definitely be ruled out is the finding (Carden, Levitt, Jusczyk, & Walley, 
1981) that, given a single continuum of formant transitions, listeners place 
the phonetic bounclary ^t different locations dependin^^. on whether they are 
instructed to hear the stimuli as stops ([ba], [da]) or as fricatives ([fa], 
[ea]). This can only be accounted for as an adjustment — and apparently a 
perfectly automatic one — for the fact that the places of production are some- 
what different for the two stops from what they are for the fricatives. 
Hence, it becomes yet another example of the rule that phonetic categorization 
is , guided by internal criteria that reflect the prototypij^al acoustic and 
articulatory characteifistics of speech. \ 

Segmental Context Effects 

A third blass of perceptual interactions taking place within a single 
utterance concerns perceptual dependencl^es among cues for different phonetic 
segments. While the conceptual distinction . from the two' classes discussed 
earlier (integration of cues to the same feauii'"®* to different features of 
the same segment) is straightforward, practical distinctions are somewhat 
fuzzy because acoustic cues generally cannot be^ apportioned exclusively to one 
or the other phonetic segment. However, an experimental dissociation is usu- 
ally possible between those signal ^aspects that provide weak (coarticulatory ) 
cues to one segment and those that are strong and sufficient cues for a dif- 
ferent segment, even when both very nearly coincide in time. 

For example, take the effect of a following vowel on fricatiVe percep-- 
tion, investi^gated — among others — by Mann and Repp (1980). The periodic sig- 
nal . portion following a fricative noise necessarily has -formant transitions 
characteristic pf the fricative's place of production, which contribute to the 
fi^icative percept, particularly when the fricative noise spectrum carries lit- 
tle distinctive information (Carden et al., 1981; Mann & Repp, 1980). There- 
fore, this effect belongs under the heading of cue integration. The identity 
of the vowel itself, however, is quite independent of the prececting fricgitive 
and therefore cannot provide any direct cues to fricative place of production. 
Nevertheless, as Mann and Repp (1980) and others (Kunisaki & Fujisaki, 1977; 
Whalen, 1981) have shown, the vowel also exerts an Influence on fricative 
perception: When the fricative noise iq ambiguous between /s/ and ///, 
listeners report more instances of /s/ when the following vowel is rounded 
(/u/) than when it is not (/a/), resulting in a quite substantial boundary 
shift on an /s/-/// fricative noise continuum. 

A number of other effects of this kind have been found in recent re^ 
search* For example, a preceding fricative noise (/s/ versus ///) affects the 
perception of a following stop consonant (/t/ versus /k/): The /t/-/k/ bound- 
ary shifts ill favor of /k/ when the precursor is /s/ (Mann & Repp, 1981). The 
effect is independent of coarticulatory cues Ip stop place of articulation in 
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the fricative noise, and It occur? also when the fricative appears to belong 
to a preceding syllable (Repp & Mann, 1981). Yet another effect operating 
across a syllable boundary has been obtained by Mann (1980): The boundary on 
a /da/-/ga/ continuum shifts in favor of /g/ when the preceding syllable is 
/al/ rather than /ar/. 

How are such gegmental context effects to be accounted for? Psychoaoous- 
tic interactions between adjacent signal portions, while not impossible, be- 
come rath^ implausible. For example, there is little reason to expect that a 
fricative toWb would "sound" different before different vowels. Indeed, when 
listeners are required to Judge the "pitch" of the noise rather than the 
phonetic category of the fricative, effects of the following vowel disappear 
(Repp, 1981). The most plausible hypothesis is that segmental context effects 
represent a perceptual compensation for coarticulatory interactions in speech 
produQtion. It is well known, for example, that anticipatory lip rounding for 
rounded vowels affects the noise spectrum of preceding fricatives (Fujisaki & 
Kunisaki, 1978; Mann & Repp, 1980), and there are indications that the for- 
mant transitions of stop consonants shift with the place of articulation of 
preceding fricatives (Repp & Mann, 1982) and liquids (Mann, 198O). The abili- 
ty qf liste^vers to compensate for these coarticulatory effects implies an 
internal representation of these dependencies, which may be conceptualized in 
dyn^lmic or static terms. 

' Segmental context effects have been, demonstrated even among nonadjacent 
se^ents. Thus, shifts in the place of articulation boundaries for Initial 
stop consonants have been found to occur as a function of the place of articu^ 
latlon of the final stop consonant in the same syllable (Alfonso, 198I ) . 
Perceptual interdependencies between two vowels separated by a consonant have 
also been reported (Kanamori, Kasuya, Aral, & Kido, 1971). These effects may 
reflect perceptual compensation for coarticulatory dependencies operating over 
wider time spans (cf. Martin & Bunnell, 1981, 1982; Ohman, 1966). 

^ • 

Speaking Rate Effects 

The perception of phonetic distinctions that rest on temporal cues may be 
affected by the temporal structure of surrounding signal portions. Since 
these effects have been thoroughly reviewed by Miller (198l), we can be brief 
hcrre. 

It is useful to distinguish between experimental manipulations of the 
duration of selected (steady-estate) acoustic segments and of time-varying 
spectral changes connected with actual (or simulated) changes in articulatory 
rate. Both temporal and spectro-temporal manipulations have been shown to af- 
fect the perception of certain temporal cues, but it is • not clear whether 
their effects take place at the same level. 

Some experiments on effects of "speaking rate" concern trading relations 
among cues for the same phonetic segment. When two temporal cues contribute 
to the same distinction, .a change in one will necessarily peflulre a compensa- 
tory change in the other' to maintain perceptual constancy. An example of such 
a trading relation is that between (preceding) silence duration and fricative 
noise duration as Joint cues, to the fricative-affricate distinction (Repp et 
al., 1978). Affricate percepts are favored by both long silences and sKort 
noises,- so an increa^4^-4n silence duration can be compensated for, within lim* 
^its. by an increase In n^ise duration. But When this trading relation was ex- 

42 



Repp & Liberman: Phonetic Category Boundaries Are Flexible 



amined in the context of a true rate manipulation — the critical cues were 
embedded in sentence frames produced at a fast or at a slow rate — relatively 
niore silence was needed in the fast sentence frame to maintain the same level 
of affricate responses. One possible interpretation of this reliable effect 
(cf. Dorman, Raphael, & Liberman, 1979) is that, in the. rapidly articulated 
context, the (constant) fricative noise sounded relatively longer and hence 
more fricative-like, so that a longer silence was required to restore the same 
level of affricate responses. This assumes that the perception of the silence 
cue was less affected by the rate manipulation. Why this shoul4 be so is not 
clear at present. We should also remark that the speaking rate effect was 
probably mediated primarily by the immediately adjacent signal portions — the 
durations of the vocalic se^jments preceding the silence and following the 
fricative noise. If so, the speaking rate effects observed may have been a 
special instance of a segmental context effect or even a trading relation. 

A good example of another "speaking rate effect" that could be put, as 
well, in the preceding section on segmental context effects is the influence 
of the duration of a following vowel on the perception of the /b/-/w/ distinc- 
tion (Miller & Liberman, 1979): The longer the vowel, thfe longer the formant 
transition duration at the /b/-/w/ boundary. This finding was interpreted as 
a speaking rate effect, and it is indeed consistent with observed changes in 
/w/ transition duration at different speeds of articulation (Miller & Baer, 
1983). However, the effect has also been obtained with infants (Eimas & Mill- 
er, 1930) and with nonspeech stimuli (Pisoni, CarrjAl, & Cans, 1983), which 
suggests a possible psychoacoustic origin— i.e., a temporal normalization ear- 
ly in the perceptual process. It is indeed questionable whether changes in 
the duration of a (steady-state) synthetic vowel are sufficient to convey any- 
thing like ''speaking rate." Within the context of cue trading relations, both 
Fitch (1981) and Soli (1982) have been able to separate perceptual effects of 
vowel duration from effects due to vowel "structure," i.e., more pomplex spec- 
tral changes taking place over time. It is the latter that are more properly 
viewed as the carriers of information about rate of articulation. 

The examples given above illustrate that true "speaking rate effects" are 
not easy to distinguish from simpler temporal trading relations and local con- 
text ef feet ^^'"^NMor cover, if speaking rate is varied, those changes that occur 
closest to the \arget segment will affect its perception most (Summerf ield, 
1981). In additio^y Miller, Aibel, and Green (1984) have recently demonstrat- 
ed that listeners' overt judgments of speaking rate do not predict the 
perceptual effects of rate manipulations. On the other hAnd, considering the 
extensive speech knowledge that listeners must possess, it seems reasonable to 
assume that they also have" intrinsic knowledge of the acoustic changes that 
accompany changes in speaking rate and that they "know" how to apply this 
knowledge in perception. An example of this was alsc) provided by Miller and 
Liberman ("l979) In their study of the /b/-/v/ distinction. When the following 
vowel was extended by a nonstationary portion containing transitiorts appropri- 
ate for a syllable-f inaJL /d/, -the effect on the /b/-/w/ b'oundary was equiva- 
lent to that pf shortening the steady-state vowel. This paradoxical finding 
presumably reflects an increase in the perceived rate of articulation due to 
the additional phonetic segment in the syllable* 
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Speaker Normalization Effects 

Phonetic boundaries along a spectral cue dimension may shift in 
accordance with the size of the vocal tract that is perceived to be the source 
of the utterance"-that is the .hypothesis, at least. As with speaking rate ef- 
fects, genuine speaker normalization effects are not easy to distinguish from 
local context effects and spectral trading relations. Moreover, a demonstra- 
tion of true speaker normalization requires that the test utterance be per- 
ceived as coming from a single source (speaker), which is possible only with 
target segments tjhat ane relatively ambiguous as to their source. For these 
reasons,' there are few convincing demonstrations of speaker normalization ef- 
fects in the literature. 

One of the earliest demonstrations was provided by Ladefoged and Broad- 
bent (1957), who showed that synthetic vowel targets were perceived different- 
ly in sentence carriers simulating different speakers. This result was 
replicated with natural speech by Dechovitz (1977). More recently. May (1976) 
with synthetic speech and Mann and Repp (1980) with natural speech found a 
shift in the ///-/s/ boundary when the same fricative noises'' occurred in the 
context of vowels produced by different-sized vocal tracts. More experiments 
along these lines are needed' to establish firmly listeners' sensitivity to the 
static aspects of the perceived speech source. 

Semantic and Syntactic Effects 

It is a commonplace observation that listeners tend to hear what they ex- 
pect to hear. Effects of semantic- context are ubiqilitous in speech perception" 
(Bagley, 1900-1901; Cole & Rudnicky, 1983). Howevi^r, these effects arj gen- 
erally obtained only when some acoustic information is missing and needs to be 
"filled in." Apparently, semantic factors can also influence the phonetic 
boundary or| an acoustic continuum characterized by ambiguous (rather than 
missing) cues. 

That such factors can influence the category boundary on a VOX continuum 
was demonstrated by Ganong (1980). He found that the boundary shifted in fa- 
vor of word responses when one of the alternatives was a word and the other a 
nonword, even though the phonetic distinction was in the' initial consonant, 
the pattern of the data suggested that the effect was not merely a response 
bias; rather, lexical status seemed to influence phonetic categorization 
directly. But this kind of direct interaction between "top-down" and "bot- 
tom-up" processes is a controversial notion (see, e.g., ^winney, 1982), and we 
do not wish to enter into a discussion of the matter here. Suffice it to 
point out that phonetic boundaries may be shifted by semantic biases. Such 
biases can be manipulated not only by changing the lexical status of the tar- 
get word but also by inducing expectations through preceding sentence context 
(Games & Bond, 1977; Miller, Green, & Schermer, 1982). However, the phonet- 
ic boundary shift obtained in that case may be eliminated by selective atten- • 
tion to the target word (Miller et alt, 1982), suggesting that semantic proc- 
essing can be consciously avoided in certain conditions (e.g., when the same 
materials are repeated over and over). Interestingly, the same study by Mill- 
er et al. (1982) also revealed that effects on segmental perception due to 
the speaking "rate of a carrier sentence could not be voluntarily disengaged. 
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Effects of syntactic boundaries on certain phonetic distinctions have al- 
so been reported (Dechovitz, 1979; Price & Levitt, 1983): If the critical 
cue for the distinction Is silence duration (as in the fricative-affricate 
contrast), more silence Is needed If a syntactic boundary Is made to coincide 
with the silence. Although claims have been advanced that this effect can be 
produced by syntactic structure per set (Dechovitz, 1979), no convincing evi- 
dence for such "pure syntax" effects exists so far. Rather, ^he effects of 
syntactic boundaries seem to be mediated by the prosodic changes that accompa- 
ny them. The fricatl ve-^af f rlcate boundary may shift depending .on whether the 
preceding word does or does not have clause-final Intonation and lengthening 
(Price & Levitt, 1983; see also Rakerd, Dechovitz, & Verbrugge, 1982)..^ To 
what extent these effects should be considered merely local context effects or 
temporal trading relations remains to be seen. In either case, they seem 
genuinely phonetic rather than psychoacoustlc. 

Cross-Language Effects 

Fop the purpose of ruling out psychoacoustlc factors and establishing 
that the location 6f a phonetic boundary Is largely determined by factors 
Internal to the listener, cross-language comparisons are most Instructive. 
Languages do differ In their artlculatory-acoustlc patterns, frequently even 
for phonetic categories that seem phoneraically Identical (see Ladefoged, 
1983). To the extent that these cross-llngulstlc differences are captured by 
a single acOustlc speech continuum (and this Is not always the case), we 
should want to know If, In fact, the phonetic boundaries differ for speakers 
of different languages. 

Unfortunately, cross-llngulstlc studies using the same stimuli and proce^- 
dures are not very numerous. Among those that do exist, most have dealt with 
the voicing dimension, .as cued by VOT, taking advantage of the fact that 
languages such as English, French,^ and Thai make their voicing contrasts in 
phonetically different • ways. While English- di^stinguishes voiced (either 
prevolced or voiceless unaspirated) and voiceless aspirated stops, French, 
Spanish, and Polish contrast prevolced with voiceless unaspirated stops, and 
Thai makes both distinctions. The single voicing boundary for English 
listeners is located in the short-lag values of VOT, between roughly 20 and ^0 
ms, depending on place of articulation (Lisker & Abramson, 1970). The single 
boundary for French, Spanish, and Polish listeners, on the other hand, is gen- 
erally located at shorter lag times, close to zero, and is considerably/ more 
variable (Caramazza, Yeni-Komshian, Zurif, & Carbone, 1973; Keating et al., 
1981; Williams, 1977). Thai listeners have two boundaries, one in the voic- 
ing lead region (where none of the other - languages mentioned exhibits "any 
boundary), 'and the other at voicing lags somewhat longer than in English 
(Fopeit, 1977; Lisker & Abramson, 1970). Thus, native language does seem to 
influence the location of comparable phonetic /boundaries on a VOT continuum, 
and it certainly determines whether or not a boundary exists at all. 
#^ 

There is ample evidence that discrimination performance is best in the 
vicinity of a phonetic boundary. Thus, discrimination peaks shift with t^he 
phonetic boundaries across languages. Speakers of a language such as Thai 
have a discrimination peak in the voicing lead region where English listeners* 
ability to detect differences is extremely poor (Abramson & Lisker, 1970). 
Another well-known example of such a cross-language difference is provided by 
the /r/-/l/ contrast, which is easily discriminated by English listeners but 
nearly indistinguishable for speakers of Japanese, a language that does not 
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contain these phonetic segmen^ts (Miyawaki et, al,, 1975). For a review of 
these and related data, see Strange and Jenkins (1978) and Repp (198^). 

In view of the flexibility of phonetic boundaries, demonstrations of a 
coincidence of category boundaries obtained for chinchillas or monkeys with 

those of English-speaking humans lose some of . Jthelr-Jmpact^ Ia:-the extent 

that these animal boundaries are stable at all (see Waters & Wilson, 1976, for 
a demonstration of large range effects), they may reveal certain paychoacous* 
tic sensitivities that, ^however, seem to exert only a weak constraint on the 
possible locations of human boundaries. 

It is likely, of course, that** the locations of phonetic boundaries in the 
languages of the world are not totally arbitrary. The structure of the speech 
•^production apparatus imposes universal constraints on articulation that may be 
reflected in a limited number of preferred boundary locations. The hypothesis 
that human infants may possess some innate sensitivity to tTiese universal 
potential phonetic boundaries (see Aslin & Pisoni, 1980) has recently gained 
momentum tjirough the remarkable findings of Werker (1982), who showed that 
prelinguistic American infants are capable of distinguishing phonetic categor- 
ies foreign to English, but lose that ability around ten months of age. It 
has not^been conclusively established, however, that these prelinguistic cate- 
gory distinctions are truly phonetic, rather than psychoacoustic, in nature. 
Exposure to the phonetic distinctions of the native language may merely induce 
a "speech mode" of listening in the one-year-old infant and thereby lead it to 
ignore irrelevant acoustic detail. Similarly, several demonstrations of 
adults' ability to discriminate foreign phonetic categories in certain labora- 
tory situations (MacKain, Best, & Strange, 1981; Pisoni, Aslin, Perey, & Hen- 
nessy, 1982) may, at least in part, reflect skills of deploying a nonphonetic 
mode of processing, and not the acquisition of a new phonetic distinction that 
can be generalized beyond the laboratory. On the other hand, mastery of a new 
language does imply the establishment of new phonetic^ categories, and it is 
primarily a matter of implementing all the necessary controls to permit the 
conclusion that this is indeed what has happened in any given laboratory 
experiment. Rigorous inyestigations of the process of phonetic learning, 
which may be a good deal slower than the time span of the typical speech 
experiment, are Just beginning (e.g., Flege & Port, 1981). 



Conclusion 

Evidence from a variety of experiments on speech perception establishes 
that phonetic category boundaries are flexible in response to each of two 
quite different sets of conditions. One set is commonly created by the way 
utterances are arranged in experiments that require the presentation of se- 
quences of test stimuli. Most of the effects of such conditions are founcj 
with nonsp^ech sounds as well, though, W)r reasons that are not yet clear, 
some may be peculiar to speech. The otheV conditions are the more interest- 
ing, at least for our purposes, because thly seem to be integral parts of the 
processes by which utterances are perceived in any test sequence and so, 
presumably, in the real--life situation. Their effects are of several superfi- 
cially different kinds, but^ common to all, there is a (more or less) apparent 
correspondence between the shift in the perceived category boundary and the 
acoustic effects of an articulatory or coarticulatory maneuver. Thus, these 
boundary shifts imply a link between speech perception and speech production, 
much as if perception were constrained by tacit "knowledge" of what a vocal 
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tract does when it makes linguistically significant gestures. Considerations 
of this kind, roughly similar to those that led originally to the (so-callec^) 
"motor theory of speech perception" (Lil?erman, Delattre, & Cooper, 1952), lead 
us to. suppose that such boundary shifts as these are peculiar to speech. 
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Footnotes 

^We are uncertain where to place in the present framework another impor- 
tant class of hypotheses, that of acoustic invariance" ( Kewley-Port, 1983; 
Lahiri, Gewirth, & Blumstein, 198^1; Stevens t/Ji Blumstein, 1978). Sometimes 
invariant properties are dclscribed in terms that suggest a boundary-oriented 
approach*-^. g. , wh6n a spectral shape is considered to be either rising or 
falling. On the other, hand, the uste of optimal "templates" (Stevens & Blum- 
stTein, 1978) suggests a prototype-oriented approach. Since the invariance hy- 
pothesis postulates invariant acoustic correlates for linguistic distinctive 
features, it would seem to permit little flexibility in category boundaries, 
particularly if the boundaries themselves are tak'en to be the invariant 
correlates. 

*Not all the studies we will cite actually examined boundary Shifts, 
Some studies showed only that the perception of a single ambiguous stimulus 
could be influenced in one or^ the other direction. 1% is safe to infer, how- 
ever, that, had that stimulus been part of an acoustic continuum, the category 
boundary on that continuum would have shifted in precisely the same direction. 
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'In the same study, however, sequential contrast was found to be contin-- 
gent on the perceived phonetic category; i.e., the effect of /spa/ differed 
from that of /ba/ (Sawusch & Jusczyk, 1981 )*• It is worth noting that, in the 
selective adaptation paradigm, the adaptors are typically presented at a fast 
rate that may discourage even covert categorizat ion. Phonetic (judgmental) 
effects may be contingent upon overt or covert labeling of contextual stimuli, 

**W§ call them perceptual functions , rather than perceptual processes , be- 
cause we believe that these accomplishments of the perceptual system should 
not be viewed in process terms. In any case, whatever neural or cognitive 
processes may underly these functions is totally unknown at present, 

'Although there have been persistent attempts to conceptualize single 
"invariant" acoustic -properties for distinctive features in speech (e.g., Kew- 
ley-Port, 1983; Lahiri et al,, 1981]; Stevens & Blumstein, 1978) these prop- 
erties never fully capture the phonetically distinctive information. It seems 
to be a fact to be accepted that what may be a unitary event at the levels of 
linguistic structure or articulatory planning emerges in a fractionated form 
at the level of acoustic description. 
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ON CATEGORIZING APHASIC SPEECH ERRORS* 
Betty TullGr+ 



) 



Abstract . Acoustic studies of voice-onset-time in aphasics* speech 
suggest that errors of fluent aphasics are misselected phonemic tar- 
gets, whereas nonfluent aphasics' errors are of articulatory origin. 
However, we must be cautious when extrapolating a theory from only 
one measure of articulation. In this experiment, I examined utter- 
ances produced by five fluent aphaaics, five nonfluent aphasics, and 
two controls. In the first part of the experiment, I replicated 
previpus voice-onset-time studies. Second, I examined the duration 
of vowels preceding word-final stop consonants as an index of the 
consonant's voicing category. The pattern of voioe-onset^times pro- 
duced did not predict the pattern of vowel durations. Thus, 
voice-onset-time , cannot be used to characterize more gejjerally the 
output of the speaker. 

Traditional clinical descriptions *of aphasia consider the errors in 
speech produced by posterior, fluent aphasics to originate at the phonemic or 
phonological planning levels, whereas phonetic or articulatory errors are 
thought to be more typical of anterior, nonfluent aphasics (Alajouanine, 
Ombredane, & Durand, 1939; Luria, 1966; Shankweiler & Harris, 1966). Though 
it is often difficult to disambiguate so-called planning and execution defi- 
ci.ts (or phonemic and phonetic deficits), a fine-grained acoustic analysis has 
great potential for describing the nature of the underlying speech disorder. 

Segmental analyses of aphasic speech have typically proceeded by examin- 
ing one parameter of the acoustic complex that signals a shift in one phonetic 
dimension. A commonly used measure is voice-onset-time (VOT), a parameter 
that distinguishes voiced from voiceless stop consonants in syllable-initial 
position (e.g., Blumstein» Codper, Goodglass, Statlender, & Gottlieb, 1980; 
Blumstein, Cooper, Zurif, & Caramazza, 1977; Freeman, Sands, & Harris, 1978; 
Hoit-Dalgaard, Murry, & Kopp, 1980; Itoh et al., 1980; but see Shinn & Blum- 
s.tein, 1983, for an analysis of place of articulation errors). VOT is the 
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acoustic representation of the time between the burst at release of aupraglot- 
tal occlusion and the onset of glottal pulsing. For voiced stop consonants in 
syllable-initial position, glottal pulsing might begin before the release 
burst, or lag as much as 25 ms after release. In voiceless stop consonants, 
the onset of glottal pulsing might lag behind supraglottal release by approxi- 
mately 35-80 ms (Lisker & Abramson, 196^, ^967 In normal English speakers, 
the actual VOT values vary somewhat as a function of, for example, place of 
articulation, speaking rate, and phonetic context. Nevertheless, the 
distribution of VOT values for voiced and voiceless word-initial cognates is 
bimodal and more or less nonoverlapping, particularly when the words are pro- 
duced in list form. 

« 

In contrast to normal speakers, nonfluent aphasics are reported to pro- 
duce voiced and voiceless stop consonant cognates having- about the same VOT 
values (Freeman et al., 1978), so that the resulting distribution of VOT is 
unimodal. These data have been interpreted as indicating that the underlying 
phonological categories have merged. However, the data are also compatible 
with the view that these speakers select the correct phonemic targets for 
production, but the articulation itself is so distorted that the difference 
between cognates is not maintained (at least on the VQT dimension). Blumstein 
et al. (1980) attempted to examine this question directly. They operationally 
defined a production error as an error in selecting the phonemic target when 
the VOT value of the utterance fell within the range of the opposite voice 
category, as when a required [b] was produced with a VOT value longer than 35 
-ms. A production error was considered to be of phonetic origin when its VOT 
value fejl between the normal distributions for the voiced and voiceless cate- 
gories, as when a required [b] was produced with a VOT value between 15 ms eind 
35 ms. In accord with previous work, Blumstein et al. found a large overlap 
of VOT values for voiced and voiceless productions by nonfluent (Broca's) 
aphasics, • suggesting a pervasive deficit in the timing of articulatory move- 
ments. They noted, however, that nonfluent aphasics produced some apparent 
phonemic errors as well, pijirticularly on voiceless stop consonants. In con- 
trast, errors produced by fluent (Wernicke) aphasics tended to fall within the 
VOT range of the opposite voice category, suggesting that their errors were 
primarily errors in selecting the appropriate phonemic target, although some 
apparent phonetic errors were also noted. 

This description is intuitively satisfying in that it agrees with 
subjective clinical impressions. However, as Blumstein et al. recognize, we 
must be cautious when hypothesizing differences in the mechanisms for produc- 
tion errors from only one measure of articulation. For example, even when 
restricting discussion to the voicing feature, we find at least sixteen cues 
that potentially influence perception (Lisker, 1978). If the pattern of er- 
rors on the VOT dimension is truly indicative of* a more general speech disord- 
er, then some predictions should hold true. Specifically, a speaker producing 
apparent phonemic errors as reflected in VOT values might be expected to pro- 
duce a similar distribution of errors when the same, phonemic target appears in 
different positions in a syllable, even though the phonetic realization of the 
phoneme may be quite different. For example, in English, one strong cue to 
stop consoneint voicing in syllable-final position is the duration of the 
preceding vowel, which tends to be longer before voiced than before voiceless 
consonants for both adults (House, 1961; House & Fairbanks, 1953; Klatt, 
1973; Peterson & -Lehiste, I960; Raphael, 1975) and children (Raphael, Dor- 
man, & Geffner, 1980). Thus, if the errors are truly of-phonemic select ioh 
and have rto phonetic component, aphasic speakers who produce voicing errors 
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that fall within the range of VOT values for the opposite voice category in 
syllable-initial position should show voicing errors in sy llable--f inal posi- 
tion characterized by preceding vowel durations that fall within the range of 
vowel durations occurring for the opposite voice category (i.e., a. bimodal 
distribution of vowel durations). ^ 

The predictions regarding apparent phonetic errors are much less clear. 
Basically, the number of errors produced should be a function of* the difficul- 
ty of articulation, which might be affected by a segment's position- within a 
word. Unfortunately, it is as yet impossible to quantify t\}e complexity of 
articulation involved in producing quite different acoustic results. If, how- 
ever, the articulations involved in producing changes in VOT and vowel dura- 
tion are of the. same order of difficulty, nonfluent speakers should' show the 
same distribution pattern for voicing errors in syllable-initial ancl syll- 
able-final position. If voicing production in initial position iq more diffi- 
cult than in final position (as one might perhaps expect from the difficulty 
aphasics often have initiating speech), we would expect a greater number of 
pljonetic errors in initial position than in final position. Another possibil- 
ity is that "art iculatory complexity" differs across speakers. If this is so, 
individual speakers might show a coherent pattern of phonetic errors across 
syllable positions that is not evidenced by the clinical group. 

The study reported here is an attempt to determine whether the pattern of 
production errors indexed by VOT can be used to characterize more generally^ 
the output of the aphasic speaker as c<l^taining primarily "phonetic" dr 
"phonemic" errors. To this end, the VOT findings of Blumstein et al. (1980) 
and Itoh et al. (1980) are first replicated. Next, for the same speakers, the 
duration of the vowel "preceding a final stop consonant is examined. Both 
acoustic dimensions are interpreted with regard to ''apparent phonemic" and 
"apparent phonetic" errors. In this study, errors are operationally defined 
as "apparent phonemic" errors when categories are misplaced along some acous- 
tic dimension, though contrast is maintained. "Apparent phonetic" errors are 
operationally defined as those instances of production that fall betjween cate- 
gories. 

Method 

C ^tttgects . The subjects in* this study included five fluent (Wernicke) 
aphasicsV referred to hereafter as F1 through F5), five nonfluent (Broca's) 
aphasics Preferred to as NF1 through NF5), and two normal controls. The flu- 
ent aphas\cs w6re art iculatorily agile and used phrases of normal length. 
However, their speech often made no sense. All of the nonfluent aphasics 
spoke hesitantly, with long pauses between words, that is, in an "effortful" 
manner. Three of the nonfluent aphasics would be characterized as agrammatic 
(NF1,*NF2, and NF5) and three were apractic (NF3, NF^, and NFS). The diagnos- 
tic category of each patient was determined by performance on the Boston 
Diagnostic Aphasia Examination (Goodglass & Kaplan, 1972) and other neurologi- 
cal and neuropsychological tests. A list of 35 monosyllabic and polysyllabic 
words and sentences (selected from a larger list provided by Darley, Aronson, 
& Brown^-J-975) was used to assess the presence of speech apraxia. A speaker 
was diSgnosed as "apractic" if production of the list contained numerous but 
inconsistent phonetic errors of various types, as well as attempts at 
self-correction. The errors were Judged by a linguist who had no information 
concerning the individual patients. In all cases, etiology was vascular and 
involved only the left hemisphere (see Table 1 for additional information). 



57 

61 



Tuller: On Categorizing Aphasic Speech Errors 

No tumor or trauma cases were included • All of the subjects were right-handed 
premorbidly. r 



Table 1 



Descriptive data for aphasic subjects 









Years of 


Year of 


Auditory® 




Speaker Type 


Age 


Sex 


Schooling 


Onset 


Comprehension 


Hemiple, 


Fluent 














Fl 


57 


F 


16 


1972 


+.7 


No 


F2 


67 


F 


16 


1969 


-.3 


No 


F3 


49 


M 


- 16 


1977 


+.06 


No 


F4 


55 . 
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10 


1976 


-.12 


No 


F5 


43 


M 


14 


1972 


-.6 


No 


Nonf luent 














NFl** 


61 


F 


16 


1979 


+.2 


No 


NF2** 


66 


M 


12 


1980 


.0 


Yes 


NF3*^ . 


67 


M 


4 


1979 


+ .7 


Yes 


WH? 


69 


N 


20 


1980 


+1.0 


Yes 


NF5***' 


52 


N 


8 


1974 


+1.0 


Yes 



a, 



Mean of the four auditory comprehension subtests of the Boston Diagnostic 
Aphasia Examination (Goodglass 4 Kaplan, 1972); Agrammatism; ^Speech 
apraxia 



Stimuli . The stimuli were thirty prepausal stressed cor||3onant-vowel-con- 
sonant words whose vowel was always [ae]; however, slight vowel quality 
changes across words did occur for some speakers. The test words included 
minimal pairs differing on the voicing of either the initial or final conso- 
nant (e.g., bat^ vs. 2^ and bat vs. bad ). Each word (preceded by the word 
"THE") was printed in large capital ^letters on an index card and presented to 
the subject in random order. 

Procedure . Subjects were tested Individually in a sound-insulated room. 
On presentation of the stimulus card, subjects were required to read the 
phrase aloud at least twice. If the subject was unable to read tjie card easi- 
ly, the experimenter would pronounce the phrase for the subject to repeat. 
The riindomlzed ,list of phrases was presented a minimum of eight times so that 
each subject attempted to produce at least sixteen tokens of each stimulus 
word. Subject responses were recorded onto a high-quality tape recorder for 
later analysis. 



Data analysis . Broad phonemic transcriptions of all utterances were made 
by a trained linguist. Target segments transcribed with a different manner 
(e.g., [m] instead of [b]) or place of articulation (e.g., [d] instead [b]) 
are excluded from further report. Substitutions of, for example, Ih ] for 
Cb] were included in the analyses. VOT and vowel. duration of the remaining 
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utterances were measured using an interactive computer program that displays 
the acoustic waveform. VOT was defined as the time from the energy burst 
representing initial stop consonant release to the onset of acoustic 
periodicity representing vocal fold vibration. Vowel duration was defined as 
the interval from the onset of acoustic periodicity (excluding any initial 
aspiration) to the first acoustic evidence of closure for the final stop con- 
sonant (the time when the high frequency components of the periodic wave dis- 
appear). Spectrograms were also used when VOT or vowel duration could not be 
measured from the acoustic waveform. 

Results 

Voice Onset Time 

The frequency distribution of the VOT values was plotted individually , for 
each subject. Figure 1 shows examples of the distribution of VOT values for a 
normal control, a fluent aphasic (subject F3), and two nonfluent aphasics 
(subjects NF2 and NFI), Data from these particular aphasics are shown because 
F3 and NF2 produced the expected patterns of VOT distribution but NFil did not. 
The distributions werp analyzed in two ways. First, apparent phonetic and ap- 
parent phonemic errors were catalogued using the procedure described by Blum- 
stein et al. (1980). Briefly, if at least two instances of crossover of VOT 
values between the voiced and voiceless distributions occurred, then all VOT 
values within this middle range were counted as apparent phonetic errors. The 
boundaries for this middle range were taken from earlier studies of VOT values 
in normal speakers (Lisker & Abramson, 1961, 1967) and were +15 to +35 ma VOT 
for bilabial stops, +20 to +10 ms for alveolar stops, and +25 to +15 ms for 
velar stops. For a production to be counted as an apparent mistargeting er- 
ror, its VOT value had to fall in the r«nge appropriate for its voicing cog- 
nate. 

The results of this analysis are shown in Table 2 and are in fairly good 
agreement with other reports (Blumstein et al., 1980; Freeman et al., 1978; 
Hoit-Dalgaard et al., 1983; Itoh et al., 1980). A two-wa^ ANOVA resulted in 
no significant main effects of group, F(1,8) = 0.31, £ > .1, or error type, 
F(l,8) = 0,05, 2. > .1, but a significant groups by error type interaction, 
F.(1,8) = 5.69, £ < .05. As can be seen from the totals column in Table 2, 
this interaction occurred because the nonfluent aphasics as a group produced 
more apparent phonetic than phonemic errors, whereas -the fluent aphasics as a 
group produced more apparent phoneinic than phonetic errors. The columns 
representing the different target sounds indicate that this differential pat- 
tern of errors occurred for nonfluent aphasics on all of the six target sounds 
but on only four of the six target sounds for fluent aphasic speakers. Moreo- 
ver, the tendency for nonfluent speakers to produce more apparent phonemid er- 
rors on vol,celess than voiced stops was not replicated. The two control sub- 
-jects produced no errors of any scrt. 

Table 3\shows the erroraatterns for individual speakers. Four of the 
five fluent Whasica showed mostly bimodal distributions of VOT /With the 
majority of errors falling within the, range of the other voice cate'gory (ap- 
parent phonemii? errors). For one fluent aphasic (F1), the voiced and voice- 
less categories were overlapped considerably, with many errors produced in 
both the apparent phonemic and apparent phonetic ranges. It is not clear from 
results of the diagnostic battery why this subject differed so markedly from 
the other fluent aphasics. 
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BILABIAL STOPS: INITIAL POSITION 
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Figure 1 



The distribution of VOT values for bilabial atop consonants in 
word-initial position for a normal control, a fluent aphasic, and 
two n^nfluent aphasics. VOT is plotted on the abscissa of each 
graph, number of productions on the ordinate. The vertical lines 
crossing the four graphs at 15 ms and 35 ms represent the upper and 
lower boundaries of voiced and voiceless bilabial stops, 
reapectively. (-) The required production was [b]; ( — ) the 
required production was [p].' ■ 
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Table 2 

Apparent "phonemic" and apparent "phonetic" errors expressed as a percent -^of 
total productions for each target consonant, across speakers 



Target Consonant 



Error Type 


P 


b 


t 


ft 

d 




8 


Total 


"Phonemic" 
















Fluent 


22.7 




9.7 


8.H 


12.5 




20.0 


Nonfluent 


12.0 


15.9 


3.2 


15.8 


2.1 


19.6 


11 .i| 


"Phonetic" • 
















Fluent 


8,3 


8.2 


13.7 


10.7 


3.8 


5.5 


6.7 


Nonfluent 


22.9 


29.9 


13.8 


21 .8 


22.8 


23.9 


22.3 



Table 3 , 

"Phonetic" and "phonemic" errors expressed as a percent of each speaker's 
total production of each consonant (Criteria: Lisker & Abramson, 196^1, 1967) 



N Targ et Consonant 



error Type 


P 


b 


t 


d 


k 


g 


Total 


•Phonemic" 
















F1 


0 


0 


0 


0 . 


0 


27.8 


4.8 


F2 


9.1 


8.0 


21.7 


0 


. 0 


12.5 


8.5 


F3 


52.8 


56.8 


5.3 


0 , 


20.0 


59.1 


32.3 


Fi| 


20.8 


25.6 


0 


41.2 


0 


68.9 


26.0 


F5 


30.8 


32. i| 


20.9 


0 


41 .4 


42.6 


28.0 


NF1 


3.0 


1^1.3 


2.8 


23.3 


0 


38.2 


13.4 


NF2 


13.3 


27.8 


5.6 


m.9 


0 


47.6 


22.5 


NF3 


33.3 


1.8 


2.8 


6.1 


0 


5.1 


7.9 


NFH 


0 


27.1 




8.0 


2.4 


5.1 


7.9 


NF5 


10.2 


8.5 


0 


0 


8.1 


2.0 


4.8 


'Phonetic" 














■1 


Fl 


0 


0 


0 


0 


0 


0 


0 


F2 


0 


0 


0 


0 


0 


0 


0 


F3 


0 


0 


0 


0 


8.3 


0 


. 1.3 


Fi| 


37.5 


.0 


18.6 


50.0 


9.3 


26.7 


30.5 


F5 


i|.2 


0 


0 




0 


0 


1.3 


NF1 


0 


0 


0 


0 


0 


0 


0 * 


NF2 


H2.2 


37.0 


22.2 


54.5 


33.0 


38.1 


37.8 


NFS 




23.2 


5.6 


18.2 


0 


0 


10.3 


HFH 


26.5 


63.8 


6.0 ' 


7.4 


20.2 


41 .3 


27.3 


NF5 


28.2 


25.1 


33.2 


29.1 


60.7 


40.2 


36.1 



/' 
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Although all five nonfluent ajxhasios produced some VOT values that fell 
well within the range of the target's voice cognate .(apparent phonemic er- 
rors), four of the five produced proportionally more errors having intermedi- 
ate VOT values (apparent phonetic errors). In contrast, one nonfluent speaker 
(NF1) produced no VOT errors that could be characterized as apparently of 
phonetic origin. 

One shortcoming of this analysis Is that tt does not accurately reflect a 
situation in which VOTs of the two voice categories are shortened or length- 
ened relative to normal, whether or not they overlap. For this reason the VOT 
data were reexamined to determine simply whether the distribution for a given 
place of articulation^ was unimodal or biraodal. If the distribution was uni-- 
modal, no delineation of apparent phonetic and phonemic errors could be drawn. 
If the distribution was bimodal, we determined whether an interval of at least 
15 ms without a token separated the two concentrations of data. If so, then 
all tokens that fell in the opposite voice distribution were termed apparent 
phonemic errors. If the distribution was strongly bimodal but two or three 
tokens occurred within the interval between modes, 30 ms of overlap midway be- 
tween the two modes was ignored when cpunting apparent phonemic errors. The 
results of this analysis are shown in Table 

Notice first that, as a group, the fluent aphasics still seem to produce 
more apparent phonemic errors than the nonfluent group (15#0jl vs. ^.31). How- 
ever, this analysis changes one's conclusions concerning the actual number of 
targeting errors that occur. For example. In the fourth plot in Figure 1 
(subject NFi|)^, the VOT values for voiced and vcJiceless stop consonants are 
longer than those measured for normal speakers, so that the aphasic category 
boundariea do not fall at the normal category boundaries. This does not 
necessarilj^ mean, however, that the categories have merged. Thus, errors in 
producing word-initial [p] that appeared in our first analysis to be of 
phonetic origin appear^ with this less stringent criterion, as phonemic er- 
rors. 

In both analyses of VOT, no errors were produced by the control subjects. 
Interestingly, the one nonfluent speaker who produced only apparent phonemic 
errors is severely agrain{natic, but would not be characterized as having speech 
apraxia. - • 

* V 

Vowel Duration 

The duration of the vowel preceding voiced and voiceless final stop con- 
sonants was measured to determine whether the resulting pattern of errors is 
similar to the pattern of VOT errors. Figure 2 shows examples of the 
distribution of vowel durations measured for the same normal control, fluent 
aphasic (F3) and one of the nonfluent aphasics (NF2) shown in-Figure 1. How- 
ever with vowel duration, unlike VOT, one does not have a predetermined 
cut-off value for accepting a token as correct or in error. Rather than 
arbitrarily defining a range of durations as apparent phonetic errors, it was 
determined only whether for a. given place of articulation, the distribution of 
vowel durations was unimodal or bimodal. As in the second analysis of VOT, 
when bimodal distributions were separated by at least 15 ms, apparent phonemic 
targeting errors were counted. Vfhen seemingly bimodal distributions were not 
separated by at least 15 ms, the 30 ms between tile two distributions were ig- 
nored. If the VOT results are indicative of a "phdnemic" speebh disorder, 
then those aphasics who produced bimodal distributions of VOT values (primari- 
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1 • 

Talkie H V 

. ■ . * 

Percent of intended target and total productions cat^qrized as "phonemic 
errors. Criterion: Biraodality of VOT. 





P 


0 


t 


a 


K 




Total 


F1 


0 


0 


6 


0 • 


0' 


27.8 * 




F2 


9.1 


8.0 


" 21 .7 


0 


0 


12.5 


8.5 
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DO0O 




U 




59 ..5 


33-2 
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Per 


33.2 


32^ 


' 20.9 


2.2 


^1 


,1^2.2 


28.7 














X 


- 15.0 


NF1 


3.0 


6.1 


. 3.3 


19.^ 


5.9 


1il.7 


8.8 


NF2 


n 














NF3 


n 




8.3 


6.1 


2.8 




4.3 


NFi| 


36.7 


6.il 
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0 


2.11 


5.1 


8.6 


NF5 * 
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*) Unimodal 
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Table 5 

Percent of intended target' and total productions categorized as "phonemic" 
errors on the basis of vowel duration. 









f 












P 


b 


t 


d 


k 


g 


Total 


F1 


8.6 


0 


8.6 


11.1 


3.1 


8.6 


6.7 


F2 


8.7 


0 


0 


13.8 


4.2 


4.2 


5.2 


F3 




* 


6.7 


11.1 


n 




3.3 


Fi| 






3.3 


9.4 


7.1 


, 0 


3.6 


F5 


0 


0 


« 


K 


10.2 


12.5 


5.7 


NF1 


» 




N 










NF2 


12.5 


12.8 


. 6.7 


7.7 


8.7 


15.0 


10.6 


NF3 


« 


« 


« 










NF1» 


0 


47.4 


0 


9.4 


2.8 


26.3 


14.5 


NP5 


0 


0 


0 


12.4 


21 .3 


13.5 


7.9 



(*) Unimodal distribution. 
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BILABIAL STOPS: FINAL POSITION 
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Figure 2. The distribution of vowel duration measures for bilabial stop 
, consonants in word-final position for a normal control, a fluent 
aphasio, and a nonfluent aphasic* Vowel duration is plotted on the 
abcissa, huraber- of productions on the ordinate. (-) The required 
production was [bl; the 'required production was [p]. 
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ly apparent phonemic errore) should show bimodal distributions of vowel dura- 
tions. 
% 

Table 5 shows the results of- this analysis. Notice first that, as a 
group, the fluent aphaSics produced more bimodal distributions of vowel 
duration (irrespective of number of errors) than did the, nonfluent group, 
although many individual differences within groups ar^ apparent. It is also 
obvious from comparison of Tables H and 5 that the distribution of apparent 
phonemic errors produced by both fluent and nonfluent speakers is not 
equivalent for word-initial and word-final positions. Subject F1 produced 
bimodal distributions of bot^h VOT and vowel duration. Although she produced 
no^*>apparent phonemic errors on [t1 and [d] in word-initial position, in 
word-final position, 8.6? of her productions in which ft] was the required 
target, and 1 1 J* of productions in which [d] was the required target were 
apparent phonemic errors. F2 also produced bimodal distributions of 'both VOT 
and vowel duration, with the t/d distinction producing the most apparent 
phonemic errors on both measures. However, in word-inltiaf position the 
voiced alveolar was substituted for the voiceless, whereas- in word-final 
position the voiceless alveolar substituted for the voiced. Interestingly, 
this reversal is the consequence of an inappropriate shortening of both VOT 
-and vQwel duration. F3 produced bimodal VOT distributions for all three 
places of articulation, but a bimodal distribution of vowel duration only for 
the alveolar stops (the category with fewest errorsj on VOT). F^l produced only 
uniinodal distributions of VOT but bimodal distributions of', vowel duration for 
the velar and alveolar stops. -Moreover, the errors in word-initial. posi tion 
greatly outnumbered errors in word-final position. F5 produced many apparent 
phonemic • errors at all ■ places > of articulation, as indexed by VOT. In 
contrast, vov/el duration measul^es indicated apparent phonemic errors only for 
the velar stops. 

With regard to the nonfluent aphasics, one agrammatic speaker (NF1) pro- 
duced bimodal distributiorrs of VOT values for all places of articulation, but 
she produced only unimodal distributions of vowel duration. The two other 
agrammatic Jbpeakers (NF2 and NF5) showed the' opposite pattern, with unimodal 
distributions of VOT and bimodal distriButions of vowel duration. NF3 pro- 
duced a uni)[no^ distribution pf -vowel duration for alX places of articqla- 
tion, but a Uhimodal distribution of VOT only for bila®faL stops. WY^ pro- 
duced bimodal distributions of VOT and vowel duration val^§ for bilabial, al- 
veolar, and velar stops. However, errors in word-final position occurred pre- 
dominantly on voiced consonants, a pattern nOt reflected in word-ini.tial er- 
rors.*® Again, for many of these errors the measured acoustic duration was 
inappropriate.ly short. As expected, the two normal ispeakers produced er- 
ror-free bimodal distributions of vowel duration in these word lists.. 

In summary, those patients (fluent 'and nonfluent) who produced apparent 
phonemic errors In word-initial position did not necessarily produce those er- 
rors in word-final position. The result sheds doubt on the conclusion that a 
production , whose value on one acoustic dimension is approprie^te to its cognate 
is indicative of a-geher^ impairment In phonemic targeting. 

, The .regularity of 'apparent phonetic errors can also be questioned given 
the data in Tables 3, il^ and 5. As previously mentioned, it is possible to 
demarcate only an arbitrary region of vowel durations, wit+iln which produc- 
tions are categorized as apparent phonetic errors. Thus, a unimodal distribu- 
tion of measured vowel durations was considered to have "many" apparent 
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phonetic errors, a bimodal distribution to have "none," and a primarily bimod- 
al distribution with scattered intermediate data points to contain "few" ap- 
parent phonetic errors. By these rather loose criteria, no consistency was 
apparent either within or across speakers. Two o^ the five fluent aphasics 
(F1 and F2) produced no apparent phonetic errors on word-initial or word-final 
stop consonants. Speaker F3 produced only a few apparent phonetic errors on 
voiceless velar stops in initial position but many apparent phonetic errors on 
final bilabial and velar stops. Speaker produced many apparent phonetic 
.errors on word-initial stop consonants at all three places of arti<iulation, 
but only on word-final bilabial stops. F5 produced many apparent phonetic er- 
rors only on alveolar stops in word-final position. Of the five nonfluent 
speakers, two (NF1 and NF3) produced more apparent phonetic errors on final 
than initial stops. This occurred at all places of articulation for NFV, but 
only for alveolars and velars for NF3. In contrast, NF2 and NF5 produced many 
apparent phonetic errors on word-initial stops but none on word-final stops. 

Discussion 

The results of the first part of this stud^ converge with previous re- 
ports of voice-onset-time production by aphasic speakers (Blumstein et al., 
. 1980; Freeman et al., 1978; Hoi t-Dalg'aard et'al., 1983; Itoh et al., 1980). 
Usin^ the VOT 5ound5^ries established by Lisker and Abramson (196^1, 1967) it 
was determined tha'tV nonfluent aphasics as a group produced more apparent 
phonetic than ap^parent phonemic errors, whereas fluent aphasics as a group 
produced more apparent phonemic than phonetic errors. It does not necessarily^ 
follow, however, that those speakers who produce primarily apparent phonetic 
errors have merged tbe voicing categories. When VOT values were examined to 
determine simply whether the resulting distribution was unimodal or bimodal 
(ignoring the absolute VOT value), four of the five fluent aphasics and three, 
of the five nonfluent aphasics showed evidence of bimodal patterns. Thus it 
cippears that for these speakers separate voicing categories wer^ preserved. 

The major result of this study is that each speaker's pattern of errors 
on worrf-initial stop consonants (as measured by VOT values) is not a good 
predictor of the -error pattern on word-final stops (as indexed by vowel dura- 
tion). For each subject, the number of apparent phonemic errors differed rad- 
ically Across positions. In order to attribute the bulk or the errors pro- 
duced by fluent aphasics to incorrect selection of phonemic targets, one would 
have to 'suppose that the selection of phonemic targets is sensitive to the 
phoneme's position within a word. There are, ih fact, theories that consider 
a word's representation in the mental lexicon to be*phonol<3gically ordered in 
a left-to-right manner (e.g.. Cutler & Fay,^ 1982; Fay & Cutler, 1977). How-^ 
ever, this accounts neither for the unimodal distributions of VOT and 'vowel 
duration produced by fluent aphasics nor for the lack of con3istency across 
subjects as to whether more apparent phonemic errors were produced on 
word-initial or word-final stops. 

With regard to apparent phonetic errors, I had hoped to find some con- 
si3tent,jpattern, at least for the nonfluent speakers, indicating that adequate 
control of the interval between release of supraglottal occlusion and the on- 
set of glottal pulsing was more difficult than control of the duration of 
voicing, or vice versa. However, the pattern artd number of errors on initial 
stop consonant production was unrelated to the pattern and number of errors on 
final stop production. This may be because 1 ) the apparent phonetic errors 
are independent of articulatory complexity, 2) these speakers are brain-dam- 
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aged so that our (admittedly weak) "metric of articulator/ complexity" for 
normal speakers i-s not appropriate, or 3) articulatory complexity varies among 
speakers. Furthermore, the nonfluen? speakers did not group on the basis of 
presence or absence of speech apraxia or agrammatism. 

In conclusion, it appears that (at least for this small sample of aphasic 
speakers) the pattern of errors on the vdice-onset-time dimension cannot be 
used to characterize the total output of the speaker. These data also indi- 
cate th^at the traditional alignment of fluent aphasics with phonemic errors 
and nonfluent aphasics with phonetic errors is inadequate as a description of 
aphasic speech production. , More generally, we should recognize that 'phonetic 
and phonemic aspects of speech are not necessarily independent. Clearly much 
more acoustic and physiological information is needed before we can ascribe 
the constellation of fluent and nonfluent aphasic errors to primarily phonetic 
or phonemic origins. 
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Abstract . The present atudy represents a test of our hypothesis 
/ that degree of vowel-to-vowel ooartlculation is related to the num- 
ber and distribution of contrastive 'vowels in a language. Compari- 
son of vowel-to-vowel ooartioulation occurring in Swahili, English, 
and Shona indicates that there are Indeed crosa-language differences 
in the magnitude of ooartioulation. Swahili and Shona, which have 
five-vowel systems, exhibit more coarticulation on vowels than En- 
glish, which has a considerably larger vowel inventory. The rela- 
tionship between number of vowels and coarticulation suggests that 
coarticulation is not simply a by-product of the demands of fl^ent 
speech on motor planning and execution. Motor systems, while, yield- 
ing to the demands of fluent speech, appear to be constrained by the 
necessity of maintaining distirictiveness, which for each language is 
defined in the phonology. 

Recently, we have been working on a model of coarticulation that focuses, 
on constraints on variability in the acoustic space. In this model we consid- 
er phonemes to be associated with target areas as opposed to canonical target 
points. Coarticulation effects are viewed as a -by-product of moving from area 
to area, rat^her than deviation from canonical ,poiht8. It follows from t"his 
view that the magnitude of coarticulation should depend upon the size of the 
target areas. 

r 

We hypothesize that there are certain universal principles that tend 16 
constrain the size of target areas, and therefore the magnitude of coarticula- 
tion. One obvious candidate for restricting the size of target areas is the 
need to maintain distinctiveness. We would predict then, that in general, 
languages with fewer vowels can allocate more spao^' tb each vowel area than 
languages with larger vowel inventories. This hypothesis is based on the 
premise that the division of the vowel apace into distinctive areas is entire- 
ly determined by the number of vowels in a particular system. However, while 
the number of vowels is a major factor in predicting vowel distributioni there 
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are arbitrary, language~^rt ieulah aspects of vowel distribution' that cannot 
be predicted from any universal principles. Therefore, while we expect that, 
for 'example, seven-vowel systems allocate a smaller- area to each vowfel than 
three--vowel systems do, this expectation must be qualified by the fact that 
not all seven-vowel languages use the total vowel-space equally or in the same 
ways. Within 'a given language, particular vowels 'may have more area, and' 
coVisequently more coarticulatory freedom, in s^me dimens|.ons than in others. 
For example, In^ English there i^ a binary distinction in the front/back dimen- 
sion,' but several contrastive levels' of height. We therefore might expect 
coarticulation to be freer In the front/back dimension than in the up/down d'i- 
mensiori. Keepihg in mind all of the language-particular constraints that are 
not predictable by general principles of distribution, we 'would still expect 
the number of vowels in a system to have gheat predictive power^ for the size 
of individual 'vowel areas. We predict that, in general, languages with small- 
er voWel inventories can allocate^ more space to each vowel area than languages 
with large vowel inventories. If t+iis is the case, and if the si-ze of areas 
itself determines magnitude of, cparticul^tion, then languages with fewer vow- 
els ought to gene^rally .exhibit larger vowel-to-vowel coarticulation effects 
than languages with more vowels. E:^sential J^y ,* this suggests that .coarticula- 
' tion reflects not only universal motor constraints, 'but language-particular 
organization as well. ^ \ ' 

We have begun, to test the specific hypothesis that languages with fewer 
vowels show more vowel- to- vowfel coarticulation, than languages with larger 
vowel inventories. In this paper we present preliminary results from studies 
of three ^.anguages: Swahili , Shona,* and English. 

Swahili is i five-vowel B^ntu language spoken in -Southeastern Africa, 
principally in itenya and Tanzania. Because Swahili has a smaller vowel inven- 
tory than English, we expect Swahili vowels to be more affected by coarticula- 
tion l^han those of English. Based on Ohman's (1966) data, vowel-to-vowel in- 
fluences appear to be restricted to VC and CV transitions in English and Swed- 
I6i\. If coartibulatory effects are less constrained in Swahili, then we might 
expect to find! that vowel-to-vQwel influences extend into the steady state 
portions of thef Swahili vowels. 

Swahili has a typical five-vowel system, /i,e,a,o,u/. A male Swahili 
speaker pi^'od'uced five repetitions each of all possible vowel combinations in 
VpV and VtV disyllables in a carrier phrase "Nili.pata VCV Jana" (I received 
VCV yesterday). In Swahili, the penultimate syllable of a word is stressed, 
and therefore all VCVs in this experiment were stressed on the first vowel. 

Formant trajectories for the vowels were obtained by means of LPC analy- 
sis. The values of F1 and F2 in the center of the longest stretch of minimal- 
ly varying F1 and F2 values were recorded. Figure 1 is a plot of all 100 to- 
kens of each vowel in F1/F2 space, showing a large amount of variability for 
each vowel. In order to determine how much of this variability is attribut- 
able to context, we performed separate four-way analyses of variance on F1 and 
F2. In each analysis there were- four "between" factors: target vowels, 
flanking vowels, consonants, and positions, ( first versus second vowel). 
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Figure 2. Over>all effects of each of the five contextual vowels on the 
F1/F2 values of vOwels in' Swahili. 
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Given our hypothesis, we expected that vowels would strongly influence 
one another across the intervening consonant. Vowels preceded or followed by 
/!/, for example, should be higher (lower F1 values) and more forward " (higher 
F2 values) than vowels preceded or followed by /a/. In fact, we did find a 
highly significant and systematic effect* of vocalic environment, F(iJ,ilOO) - 
12.73, £ < .001 for F1 and F(iJ,iJO0) - 12.97, £ < .0001 , for F2. 

The effects of each of th^ contextual vowels are shown in Figure 2, in 
which the means of all vowels for each flanking vowel context are plotted. 
For example, the symbol "iV is the mean of all vowels when /i/ is flanking, 
and it shows the effects that /i/ exerts on target vowels. As you can see, F1 
and F2 shift in the generally expected directions, so that this figure resem- 
bles the Swahill vowel space depicted in Figure 1. 

Before we look more closely at how individual target vowels reflect vo- 
calic context, we will examine some other ooptextual influences on vowels. 
The effects of intervocalic /p/ versus intervocalic /t/ are shown in Figure 3, 
in which all target vowels with a particular flanking vowel and a particular 
•intervocalic consonant are plotted. Vowels have a significantly higher F2 in 
the context of /t/, F(1,iJ00) - 223, £ < .0001 . This may reflect forward 
lingual articulation associated with the /t/ and/or a lowering of F2 in the 
context off labial /p/. However, if the effect was due to labialization in the 
/p/ contexts, we would expect F1 as well as F2 to be lowered, but there is no 
significant effect of consonant on F1 , £(1,^400) -. 2.27, £ > .1. This suggests 
that the consonant effect Is probably due to the lingual movements associated 
with /t/. Apparent in this figur% is the fact that even /t/, which itself in- 
volves a tongue gesture, does not block vowel-to-vowel coarticulation. 

The amount and type of vowel-to-vowel coartipulation is affected by posi- 
tion. This is shown in Figure ll. The area marked carryover represents the 
effects of particular first vowels on the raeah of all second vowels. The 
lowercase letters indicate the flunking first vowels and the ways in which 
they influence the average of all sipcond vowels. The area marked anticipation 
represents the effects of second \vowels on the mean of all first vowels. 
Anticipatory effects a?^e large, as the figure indicates, and they are statist- 
ically significant, in both the F1 and F2 dimensions. On the other hand, 
carryover coarticulation is significant only for the F2 dimension. (For F1 , 
there was a significant interaction of ptxsitlon by flanking vowel, F('J,'JOO) - 
12.47, £ < .0001'. Separate ANOVAs for each position revealed a highly signif- 
icant effect of second vowels on first vowels, £(^1,200) " 27.63, £ < .0001 . 
However, vowels in second position were not significantly affected by first 
vowels, F(i|,200) < 1.0. For F2, there was no interaction of position with 
flanking vowel.) Overall, anticipatory coarticulation exceeds- carryover 
coarticulation. This is particularly stri4<ing since the fir'st vowel was al- 
ways the stressed vowel. We will return to this point when we present the da- 
ta from English. 

We will now consider how individual vowels are affected by vocalic con- 
text. We will examine the effects of second vowels on first vowels across the 
medial consonant /p/ in order to simplify the comparison of Swahill with En- 
glish and Shona, as our data set is limited tjo VpVs in the latter two 
languages. 

Figures 5 shows the effects of anticipatory coarticulation ort each of the 
five vovrels. Within each 'loop we show the effects of each of the five second 
vowels on each of the flrdt five vowels. The small letters represent the 
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Figure 3. Overall effects of contextual vowels across medial /p/ and med4al 
/t/ on the mean F1/F2 values of vowels in Swahili. 



F2 (Hz) 



1600 1500 1400 1300 1200 

















M 


M 


M 


M 


M 


M 






anticipation 



420 



460 



500 ^ 



640 



680 



• / 

/ 

Figure 1*. Anticipatory V3, carrryover effects of coartlculat|!on In Swahlll, 



77 



Manuel & Krakow:- Vowel-to-Vowel Coarticulation 




1000 



Figure 5. Effects of anticipatory coarticulatiort on each of the five Swahili 
vowels. ' ^ 
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Fij^ure 6. Anticipatory effects of coarticulation in Swahili on the mid vow- 
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flanking second vowels. Analysis indicates that the target by flanking vowel 
interaction is significant for F1 , F(l6,i<00) = 2.23, £ < .01 and for F2, 
F(l6,i|00) = H.50, £ < .0001. Simple main effects for individual vowels show 
that /e/, /o/, /a/, and /u/ are significantly affected in both F1 and F2 di- 
- mensiona, (£ < .05 in all oases), while /i/ shows significant influence of 
flanking vowels only in the F2 dimension (£ < .05). Clearly, anticipatory 
vowel-to-vowel coarticulation in Swahili is free enough to extend jnto the 
steady state portion of each of the vowels. 

We now turn to some of the details of vowel-to-vowel effects for other 
target vowels. The vowel /a/, for example, is affected by the F2 and F1 di- 
mensions of flanking vowels. It appears that the front vowels, /i/ and /e/, 
pull /a/ forward, with /i/ also raising /a/'s articulation. The back vowels, 
/o/ and /u/, pull /a/ back and up. 

■ 

The. patterns for the mid vowelis are depicted in Figure 6, which is a 
magnified plot of the effects of flanking vowels on /e/ and /o/* These look 
almost like mirror images. Clearly the anticipated flanking vowels exert a 
systematic effect along the height dimension of both mid vowels. ThQ second 
formant is not affected as we might expect if both backness and rounding are 
anticipated. Of course, from the^acoustics alone it is not generally possible 
to tease apart the relative contributions of lingual, jaw, and labial ges- 
tures. 

We have begun to model these coarticulatory effects on an articulatory 
synthesizer. Preliminary work suggests that much of the acoustic patterning 
for these vowels can be accounted for by moving the ^tongue backwards or for- 
wards in anticipation of the upcoming vowel and by moving the jaw in the 

' anticipated direction (which, of course, automatically raises or lowerg the 
tongue along with it). Based on the acoustic data and the articulatory model- 
ing, it appears that these vowels do not reflect the roundedness of the 
following vowel. It may well be that rounding per se is not contra^stive in 

y^ahili. In any case, these data suggest that not all of the configurations 
of individual articulators are anticipated in the steSTdy state portion of the 
previous vowel. Nevertheless, the overall vocal tract shape, as reflected in 
the acoustic data, does. show effects of coarticulation. 

As predicted, the. amount of 'vowel-to-vowel coarticulation observed for 
this speaker of Swahili is greater than that reported by Ohman for speakers of 
English and Swedish. However, Ohman*s study of vowel-to-vowel coarticulation 
was based on spectrographic measures, whereas we used LPC analysis. Although 
our Swahili data show more extensive coarticulation than Ohman fdund in, En^ 
glish, this could be due to the difference in LPC ver3us spectrographic analy- 
sis --procedures. Therefore, wfe examined English VPV dlsyllables, using Linear 
Predictive Cocjing (LPC) anaiysTs and measuring the most steady state portion 
of the vowels. As in the case of Swahili, the English VCVs were stressed on 
the first vowel and embedded in a carrier phrase. The vowels. we have analyzed 
for a single speaker of English , are /i, e, a, o/, and the contextual vowels 
are /i, e, * a, o, u/. There werg some difficulties in extending ;our analysis 
to English since the diphthongal nature of its vowels makes identification of 
steady state portions somewhat more difficult. Nevertheless, we are confident 
in the reliability of our measures. 
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Figure 71 Carryover effects of coarticulation in English, 
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Using LPC techniques, we did find some coarticulatory effects in the por- 
tions of English vowels that we had identified as most steady state. However, 
t^hese effects were not as large as those found in Swahili, land were restricted 
to F2, F(i4,l60) " 1^.76, £ < .0001 (for F1 , F < 1.00). As discussed earlier, 
SwahlXl vowela were significantly affected in both F1 and F2 dimensions. 
Additionally, anticipatory coarticulation exceeded carryover coarticulation in 
Swahili. However, in English, carryover effects of coarticulation were sig- 
nificantly greater than anticipatory effects. (There was a significant posi- 
tion by flanking vowel interaction for F2, F(i<,l60) «v6.1^, £ < .001. Sepa- 
rate ANOVAs for each position showed that while for^osition one, there was a 
significant effect of the flanking vowel, F(i4,80) ■= 5.08, £ < .005, the effect 
in second position was much larger, F(i4,80) - 15.5^1, £ < .OOOJ . It may be 
that directionality of vowel-to-vowel coarticulation is a language-particular 
phenomenon. 

The relative magnitude of coarticulation in Swahili and English can be 
seen by comparing Figures 5 and 7. In Figure 7 we have plotted the effects of 
carryover coarticulation in English. As shown in the figure, the effects are 
small except for the target vowel /o/. The effects are also less regular than 
in Swahili. In fact, the main effect of flanking vowels on the F2 of target 
vowels in English may be inflated as a result of the coarticulatory effects 
exhibited by the target vowel /o/. (There is a significant target by flanking 
interaction for F2, F(12,l60) = 5.23, £ < .0001. Comparing this figure with 
Figure 5, which shows the anticipatory effects of coarticulation in Swahili, 
It can be seen that the effects for Swahili are much greater and also seem to 
be more regular. 

We have done the same type of analysis on VPV disyllables in Shona, an- 
other five-vowel Bantu language. The magnitude of coa'rticulatory effects is 
fairly large in Shona, as shown in Figure 8, in which we have plotted the 
anticipatory effects of coarticulation. Shona in fact, patterns like Swahili 
with respect to magnitude of coarticulation. That is, Fl and F2 both sh"ow 
significant effects of" coarticulation, F(i4,l60) = 3.32, £ < .05 for Fl , and 
F(H,160) => 3.57," £ < .01 for F2. Additionally, anticipatory effects exceed 
carryover effects as we had observed in Swahili. 

In summary, comparative analysis of vowel-to-vowel coartiQulation in Swa- 
hili, Shona, and English supports the hypothesis that, in general, languages 
with fewer vowels vary more -as a function of vocalic context than languages 
with larger vowel inventories. The number of vowels in a system to a great 
extent predicts facts about distribution of vowels in the system. However, it 
is the distribution itself that crucially restricts variation, and there are 
language-particular determinants of distribution that are not predictable 
solely by the number of vowels. Thus, for example, in English, which has a 
relatively large vowel inventory, movement is minimally restricted in the F2 
dimension 'since relatively few vowels occupy the same horizontal plane. We 
suggest that motor systems, while yielding to the demands of fluent speech, 
are constrained by the necessity of maintaining distinctiveness. This is a 
universal principle ,that results in cross-language variability in coarticula- 
tion because distinctiveness is defined for each language in its phonology. 

The dSta presented here provide preliminary support for this hypothesis. 
We recognize the limitations of generalizing from a single speaker of each of 
three languages. Clearly, it is necessary to extend this type of analysis to 
additional speakers and languages. Additionally, it is impcirtant to support 
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thb acoustic data with more direct measures of articulatory movement. We have 
begun to gather additional acoustic data along with articulatory data from 
several more speakers. 
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Abstract . Speech is surely a complex coordinated activity, but the 
processes underlying such coordination are not well understood. We 
show fiere that artrculatory patterns in response to prolonged (1.5 
s) and short (50 ms) duration jaw pertCirbations are not fDced, but 
are highly specific .to the utterance that the speaker produces. In 
two experiments, an unexpected! constant- force load (5.88 Newtons) 
applied during upward, jaw mot/on for 'final /b/ closure in -/basb/ 
revealed near-immediate compensation in upper and lower lips, but 
not the tongue. The same perturbation applied during the utterance 
/baez/ evoked rapid and increased tongue muscle activity for /z/ 
frication, but no active lip cetinpensation. Although jaw perturba- 
tion represented a^threat tto both ut^fespances, ho perceptible distor- 
tion of speech occurred^ That a challenge to one member of a. -group 
pf potentially independent articulators is met — on the very first 
pert^urbatlon experlence -*by remotely linked members of the group 
supports the hypothesis that spaech is' coordinated through function- 
al synergies (coor'dinative strfcjctures) . \k third experiment con- 
verged on this iriterpreta.tiop „by varying thi? phase of the jaw 
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perturbation during the production of bilabial consonants. Remote 
reactions in the ^ upper lip were 'observed only when the Jaw was per- 
turbed during the closing phase of motion, that is, when the reac- 
tions were necessary to effect bilabial closure. Thus, coordinati ve— 




structures are not rigid forms of neuromuscular cooperation; rath- 
" er, they are flexibty assembled to perform specific functionsu ^ 

-\ 

The bewildering complexity of human speech is readily apparent when one 
attempts to track the spat iotempcral activities of the many anatomical struc-- 
tures involved. One needs littl'e persuasion that talking constitutes an 
extraordinary feat of motor control,, particularly if each degree of freedom 
were to be individually controlled, A notion that has gained some limited 
recognition in neuroscience (e,g,, Evarts, 1982; Nashner, Woolacott, & Tuma, 
1979; Soechting & Lacquaniti, 1981 ) and behavior (e,g, , Bernstein, 1 967; 
Fowler, Rubin, Remez, & Turvey, 1980; Kelso, , Southard, & Goodman, 1979; Tur- 
vey, 1977) is that the degrees of freedom of any articulator system (however 
one counts thera) are not individually regulated during purposive activity. 
^Rather, in many actions ranging, for example, from locomotion to handwriting, 
ensembles of musdles and joints exhibit a unitary structuring — a preservation 
of internal relations among muscles and kinematic components that is stably 
across scalar changes ir> such parameters as rate and force (jsee Grillner, 
1982, and Kelso,' 1981, for reviews). It appears, then (Bernstein, 1967; 
Boylls, 1975; Gelfand, Gurfinkel, Tsetlln, & Shik, 1971; Greene, 1972, 1982; 
'5M>"vey, 1977), that the significant units of control and coordinat^ion are 
functional gr'oupings of muscles and joints preferred to as functional syner- 
gies or coordinktive structures) that act as a unit to accompli'Sh^ a task. 
Therefore, Unsights into the cooperative behavior among articulators during 
speech lie in the identification and analysis of coordi'nati ve^ structure^, 

A window into the behavior of complex systems possessing active, 
interacting components and large numbers of degrees of freedom can be gained 
by perturbing thetn dynamically during an activity and examining how the frjse 
variables reconfigure themselves. Thus, 'a group of potentially independent 
muscles could be said to cotiiprise a single functional unit if it were shown 
that a challenge experienced by one (or more) members of the group was 
responded to by Other members of the group at a site remote from the 
challenge. For the concept of coordinative structure, the response of the 
articulatory ensemble would not be •stereotypic; rather it^^^>/etAtl — be-^^ffl^^ted 
quickly arid precisely to accomplish the task. In the c^se of speech, ^me 
components of the neuromuscular appanatus would cooperate in such a way afs Vo 
preserve the linguistic intent of the speaker, / 

Although the speech literature contains Si number of Qbservatlbns that 
suggest a coordijoative structure mode of articulatory organizaLfon, few* 
ex'periments have employed dynamic perturbation analysis. By and large the 
"perturbation?" intr*odaced to* the system* have been of a "static" . mature* 
Thus, patterns of .cooperation h^ve been observed in various articulators 
following the fixing of the Jaw (as in ^bite-block experiraentSi e,g,. Fowler & 
Turvey,- 1980; Kelso &*Tuller, 1983; Lindb^om & Sundberg, 1971), restrictfons" 
on lip movements (e.g., Riordan, 1977; Tuller & Fitah, 1980)^ surgical remov- 
al of the alveolar plate or reconstruction of the rtiandible (e,g,, Zimmermann, 
Kelso, & Lander, 1980); the insertion of palatal prostheses (e,g,-^ Hamlet & 
Stone, 1978), and so on. Generally, the ability of the 'speech system to 
compensate for these disturbances is quite remarkable. However, in many of** 
these stiMjies, various 'kinds of preadjustments could have occurred'Uefore the 
. ■ . \ ■ .. > 
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test utterances were actually produced. Thus, a more i 1 lumLnating method may 
be to perturb the articulators during the speech act and then observe conse- 
quent. movement patterns, if any, and 'the speed with which the/ are achieved. 

A pioneering experiment by Folkins and Abbs (1975) did precisely this by 
-occasionally loading the Jaw during the closure piovement for the initial /p/ 
in the utterance "a /hae paep/ again." Lip closure was attained in all ci^ses, 
apparentfy by exaggerated displacements ai!d velocities of the lip closing- ges- 
tures, particularly by the upper lip.^ Similarly, Folkins and Zimmermann 
(1982) used electrical stimulation to produce unexpected deff^ression of the 
lower lip prjor to and during bilabial closure. Compensatory changes in Jaw 
and upper lip movements w^re observed to effect the bilabial gesture. Al- 
though these findings are oondistent with the coordinativ^e structure concept, 
it is not cljear from Existing data whether, in fact, the patterns of articula-- 
tor coupling following Jaw perti)rbations are in any sense standardized (as one 
^might predict if they wfere completely preprogrammed or a result of fixed in- 
put-output loops) or whether they are "functional," i.e., direct;.ed to the sta,- 
ble production of the intended utterance. If the former, the^pattern of re- 
sponse to a given Jaw perturbation should be the same reg^irdless of utterance. 
If the latter, different patterns ^of articulator cooperation (coordinati ve 
structures) Should occur, tailored to the particular phonetic requirements.^ 

" In the first ^two experiments reported here, we examined the effects, of 
Jaw perturbation on productions^ of two phonetic segments, /b/ and /z/. For 
/b/, the primary vocal tract constriction is created normally by bilabial clo- 
sure. For /z/, the main constriction is produced by positioning the tongue in 
close approximation to the palate or teeth. Note that from a low vowel 
environment Jaw and lips cooperate for procjuctl^on of /b/^ whereas Jaw and 
tongue cooperate in the raising gesture for /z/. Thils if the Jaw is perturbed 
during the transition into the final /b/ in /baeb/, then the primary response 
should occur in the lips, rather than, say, the tongue. In contrast, if the 
same perturbation is applied during the Jaw raising for -the final /z/ in 
/baez/, the primary response should occur in the tongue, not the lips.- 
Experimertt *;1 presents an initial exploration of this idea. Experiment 2 pro- 
vides more' detailecf electromyographic and kinematic evidence for task-specific 
articulator cooperation. A third ^ejtperi men t attempts to converge on the 
interpretation of the first two expertments by examining remote reactions to 
Jaw perturbation as a function of the phase of Jaw motion at which loads are 
applied. For example, upper lip' responses should only be observed when the 
Jaw is perturbed during the clo.sing gestures for bilabial consonant pro/Ctuc-^ 
tion, that is, when the -'upper lip contributes to vocal tract occlusion. / 

Experiment 1 

Subject, Materials, and Procedures / 

One adult male (one of the ^uthors)^ participated in the first two experi- 
ments reported here.' The ^ speech sample contained two utterance types, "a 
/ba9b/ again" and "a /baez/ again." In the first part of the exp^ment, 30 
trials of each utterance were performed in a single block. On 20^of the tri- 
als (6 randomly selected trla^ but of 30 for each utteranc^) a load perturba- 
tiph was applied to the Jaw during the closjlng gesture for the second conso- 
nant, /b/ or /z/. The perturbation was triggered during /bsbb/ and /baez/ 
when the Jaw reached the same predetermined point approximately midway through 
its upward trajectory. The experiment was performed with a constant forces 

■ as. , 
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load of 1 .5 s duration. Exactly the same procedure was repeated in the second 
part of the experiment, but with a 50 ms load. It is important to note that 
the subject did not know on which trials he would be perturbed. Moreover, un- 
til the. first perturbed trial, the subject was unaware of the specific locus 
of the perturbation ducing the raising trajectory and the magnitude of the ap- 
plied l6ad. ^ 

Apparatus and Data Recording 

Figure 1 illustrates the experimental set-up. The subject sat In a den- 
tal chair with his head fixed in a specially designed cephalostat (basically a 
plaster cast mold constructed for*the subject's head and a clamp that fitted 
onto the bridge of the subject's nose--- all enclosed in a wooden box, Figures 
lA and IB). A custom-made titanium dental prosthesis fitted onto the sub- 
ject's lower teeth (Figure 1C). Two small rods of the prosthesis protruded 
from the sides of the mouth and were coupled by a thin wire to a Brushless DC 
torqi^e motor that was situated perpendicular to^ the subject's chin. A load 
cell/ placed in series with the ooupling wire monit©re(| applied torque. This 
enabled. us to control the torque motO(* under force feedback and made it possi- 
ble to couple the motor- to the Jaw with a very small tracking load of approxi- 
mately 30 "g. Jaw movements were monitored by a rotary * yoltage displacement 
transducer placed at the axis of rotation of the sector arm (see Figure IB). 
The existence of the tracking force had ho perceptible effects on the sub- 
ject's speech, .nor on observed movement and EMG activity. The experiments 
were completely controlled by a projgrammable microcomputer that specified on 
which trials the load was" to be added and the magnitude of the load. In each 
experiment the load was the same (5.88 Newtons). and the rise-time to peak 
load was small, on the order of 2-3 ms. 

Infrared light-emitting diodes were attached at the vermilion border of 
the subject's upper and lower lipJ* at the midline, and sensed by an optical 
tracking system (a modified SELSPOTVsystem). The displacements of the articu- 
lators and the acoustic speech s\gnal were stored on FM tape for later 
computer analysis. A set of softwar^ routines was used to differentiate the 
mpvement signals and display the audi^ output along with movement information 
in a time-synchronized format. The acoustic recordings were inspected to de- 
termine the first evidence -of bilabial closure for the ^final /b/ in /baejD/ 
trials (defined here as the point when the high frequency coraupnents of the 
periodic wave disappear ) and of frication onset for /z/ in the Vbaez/ trials 
(defined as the onset of high frequency, low amplitude noise). 

Res\ilts and Discussion 

In this experiment , . we evaluated the effect of the Jaw perturb on 
upper and lower lip movement, and whether the- effect was cont.ext-sensiti ve. 
We first established th^t the 1.5 s load prevented the Jaw from reaching its 
usual position, by measuring Jaw height at the earliest acoustic evidence of 
lip closure .&r frication. The results are presented in Table 1, which shows 
the mean articulator positions for the Jaw,^lower lip plus Jaw, anj upper' lip, 
obtained from an arbitrary reference point. For both phonetic contexts, the, 
^aw was significantly lower during 1.5 s \Load trials than for the immediately 
preceding unloaded trials, t(10) » 26^99, £ < .001 and it(10) » 3.18, £' <.05, 
for /bafeb/ and /baez/, respectively. 
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Figure 1. A. The general experimental^ set-up. B. A schematic of the subject 
in the head appar£^tw»r showing placement of LEDs for movement 
tracking and electrddes for monitoring EMG activity'. OOS and 001 
are orbicularis oris superior and inferior^ respectively. GG is 
the genloglossu^i ' a major tongue rajiscle (see te\t for details).. 
C. A specially designed JaW prosthesis. Note gaps for misslrtg 
teeth that, afford a unique capability for setting the, prosthesis 
. firmly in the mouth of the subject ( see- Footnote 3). 
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Table 1 

» 

Mean Articulator Position (and sd) i*n ram' 



Load: 
1 .5 3 

Jaw * 
Lower lip 
Upper lip 



50 ma 

Jaw 

Lower 

Uppir 



lip 
lip 



At Onset of Closure 
/baeb/ 



Conthol 

141. i» (.111)- 
23.3 (.01) 
2.3 (.39) 



i»1.2'(.il1) 

23.1 (.23) 
2.6. (.18) 



*£ < .05 
**£ < .00 
* Measured from 




Load 

35.11 (.35)«« 
23.0 (.116) 
1.6 (.147)* 



140.5 ('.81 ) • 
23.0 (.23) 
2.1 (.36)* 



At Onset of {'"rication 
/baez/ 



Control 

142.3 (.01 ) 
23.3 (.01) 
6.1 (.214) 



142.2 (.01) 
23.3 '(.23) 
5.5 (.1414) 



L.qa d . 

314.6 (.01)** 
22.8 (.146)* 
5.8 (.141). 



141.9 (.142) 
23.3 (.01) 
5.6 (.39) 




bitrary reference position. The lower the number for a 



given articulator, the lower is its spatial position. 



The coordinative structure concept predicts one gonsequence of this 
difference in Jaw height, namely,' that upper lip displacement downward should 
increase when producing /b/, but np't /z/, when the Jaw load i^ applied.. The 
didplacemervt of the upper lip downward in ea^i) trial was measured at the time 
of acoustic onset of final /b/ closure or final Az/ frication.' As predicted, 
the position of the upper lip at final /b/ closure was lower tot the perturbed 
trials than for th6 immediately preceding unperturbed trials, ^(10) » 2.6^4, £ 
< .05. In contrast, thefe was no difference in upper lip position for /z/ 
with and without a load, t(10) -'1.i4j4, £ > .J. In addition, the position of 
the lower lip in space at the pofjlt^', of closure for /b/ was Unaffected. by the 
1.5 s load, indicating a considerable adjustment for the iowbr Jaw position, 
t(10) - 1.65, £ > .1. Similarly. Jfor /z/ although the lowei^ lip is lower in 
space, t.dO) - 2.68, £.< .05, the diipference is small compared to the much 
. lower Jaw position. These lower lip reactions will be considered in more de- 
tail in the following experiment. 

When the applied load was of 50 m's duration, no effect of perturbation 
was apparent on Jaw position by the time closure or fricatipn was achieved, 
«^(10) « 2.02 and 1.57 for /baeb/ and /baez/, respectively, ps > .05. Lower 
Tip pbsition also showed no effect of the 50 ms load, ^(107 * 1.05 for /bseb/ 
and .142 for /baez/, gs > .1. Although upper lip pq^sitlon Ton, /z/ was similar- 
ly ;'unaffect'ed by this short-duration load, t(10) - 0,26,,£ > . .1^*-Mie> upper lip/* 
in /b/ did.^ increase its downward deflection in loaded trials rtelative to un- 
loaded^ trials,:: t(10) 2.96, £ < .05. The change in upper lip displacement, 
but not lower lip, is most probably a function of an increase in compression 
of the upper lip, . 

' *84 88 
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To summarize, these preliminary observations suggest that a disruption in 
movement of one articulator (the Jaw) is responded to by another, remote 
articulator (the upper lip), when th^ phonetic context is one for which that 
reaction is functionally appropriate. However, the experiment has three 
shortcomings. First, although we hgve provided evidence of a coordinative 
structure during /b/ production, we have not provided direct evidence for its 
presence in /z/ production. Second, in order to understand the articulatory 
system's response to perturbation, both detailed kinematic and electromyo- 
graphic information are desirable. Third, and relatedly, in order to evaluate 
the reliability of the effects described in Experiment 1, a greater number of 
trials is warranted. For example, in Experiment 1 it may be that the 50 ms 
load had a slight effect on artioulatory movements (as suggested by the in- 
crease in upper lip displacement for /b/), but six loaded trials are insuffi- 
cient to comprise a sensitive enough test. For these reasons, we performed. a 
second experiment, similar in maay respects to Experiment K In Experiment 2, 
the total number of trials was increased and, in addition to monitoring Jaw 
and lip movements, electromyographic (EMG) potentials from tongue and lip mus- 
cles were obtained. We were especially interested 'in evaluating tongue muscle 
activity during /z/ production. 

Experiment 2 
Subject, Materials^ and Procedures 

- The same subject who participated in Experiilient 1 took part in this 
study. The speech sample contained the same two utterance types as in Experi- 
ment 1, "a baeb again" and "a baez again." In each part of the experiment, ^10 ' 
trials of e§ch utterance were performed in two 20-trial blocks. At least 5 s 
separated individual trials. On 25% of the trials (10 randomly selectecl tri- 
als out of 40 for each utterance) a load (5.88 Newtons) was applied to the Jaw 
during the closing gesture for the second consonant, /b/ or /z/. T^e load was 
t^ggered during /bgeb/ and. /baez/ when the Jaw reached the same predet^ermined 
p<?lnt approximately midway through its upward trajectory. Once again, the 
.svitjjecty knew that some of the trials- would be perturbed but not which ones. 
Nor did the subject , exp.erience any form of loading (except the tracking load) 
until the experiment proper. The firqt part of the experiment was performed 
with a constant force load of 1-.5 s duration, the second part with a 50 ms 
load. The utterance order was counterbalanced'^ross loading conditions. 

Appapaftua and Data Recording 

■ « > 

The Jaw loading device and the methods of tracking movements of the Jaw, 
upper lip; -and lower lip were identical to the'previous experiment. In addi- 
tion to these, kinematic measures, BMG' potentials from a muscle in the upper 
lip (orbicularis oris superior, COS)- and a muscle in the lower lip (orbicular- 
is oris inferior, 001) were obtained using paint-on electrodes, while EMG 
potentials from a tongue muscle (the pd^terior portion of genioglossus, GG) 
were obtained using bipolar ^ hooked-wire electrodes inserted by our resident 
laryngol(jgist, Dr. Kiyoshi Honda. The genioglossus recordings, were used as an. 
index of tongue actl^vitjt during /z/ production. The displacements of the 
.articulators, EMG from tongue and^ lip muscles, and the acoust^ic speech signal 
W^re stored on ^FM tape for later .computer analysis. Software routines were 
ua^d to differentiate t^ie movemerff signals, emaembl* average the rectified EMG 
signals, and display the audio output'-- synchronized with movement and EMG 
Ihforih^tlon. 
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Resulfs and Discussion 

First we established once more that the upward jaw trajectory differed in 
loaded and unloaded trials. The position of the Jaw in each t.rial was mea- 
sured at the earliest acoustic evidence of final /b/ closure or /z/ frication. 
Position of the Jaw in loaded trials was then compared to normal conditions 
and was significantly lower for both /baeb/ , t(l8) « 10.20. < .001 and 
/baez/p L(l8) - 22. ^< .001. In Figure 2, a sample of the Jaw velocities 
is shown for the first eight perturbed trials of both /baeb/ and /hdez/ utter- 
ances for one subject.- The effect of the load perturbation was to alter the 
direction of' Jaw movement almost immediately in a very consistent manner. 
That is, .the Jaw velocity became sharply negative Just after torque onset. 
Loaded trials show very small ' trfal-to-trial variability in the Jaw velocity 
^ profiles for both utterances. 

The displacements and velocities of the upper lip, the lower lip (with 
the contribution of Jaw subtracted out), and the Jaw itsel>f are shown for per- 
turbed and unperturbed ("control") trials in Figures 3 and Each trace re- 
presents the average of 10 'tokens, with the dotted trace indicating the con- 
trol utterances and the solid trace the perturbed utterances.- The vertical 
line in each window of the figures marks the onset of torque to the Jaw, Even 
though the torque prevented normal ' upward Jaw motion, lip- closure for /b/ and 
frication for /z/ were attained on all trials. In Vbaeb/,"' for example, peak 
lower and uppeh lip displacement occurred on the average 5 ms before and 5 ms 
after acoustic closure,"* respectively, on control trials, and 11 ms and 7 ms 
on the average after acoustic closure on perturbed trials. Thus, nhe timing 
differences among articulators were small between perturbed and unperturbed 
utterances, and we were not able to hear any obvious differences in the utter- 
ances between the two condition^. 

Examination ^f the kinematics in Figures 3 "and k and corresponding recti- 
fied and averaged EMG^ in Figure 5 reveals interesting adjustments in response 
to Jaw perturbation. Figure 3A shows that the downward displacement of the 
upper lip ^ in /baeb/ is greater than its unperturbed control. Measured at the 
acoustic onset of final /b/ closure, this difference is highly significant, 
two-tailed t(r8) - 3-19, £*< .01. In^ contrast, for /bsez/ (Figure 3B) the 
upper lip shows no displacement differences between perturbed and control 
conditions, t(l8) » .OOP, £ > ,1, when measured at the onset of /z/ frication. 

^ One anomalous^ result is thatJ OOS (Figure 5 top)- shows an active increase 
inoEMG activity with an averag<J^tency of 20 ms in response to the added load 
for both /baeb/ ^nd /baez/ (SD « 18. ms)"^ Thus', even though there are 
differential ,. movetnent effectJ^Kn /baeb/ and /baez/ As a function of perturba- 
tion, the EMG response^ at leist in terms of its tim^^ng, is similar^ in both 
utterances. , Although pifzzling, several, perhaps related, interpretations of 
this result are possible. One is* that although in' /baez/ there -was little 
vertical upper lip displacement, t^e subject was observed to protrude the Hf>s 
sliglTbly, a maneuver that could be reve^'led by measuring horizontal displace- 
ment. The -present study, however, does not.a^llow us to evaluate this possi- 
bility^ Relc^tedly, there ar^ some . suggest ive 'hints in the data shown in Fig- 
ures j*^ and 5 that the Jaw and upper lip may be functionally coupled In /baez/. 
as v^isll as /'baeb/. The increase in. EMG that is time-locked to Jaw perturba- 
j^lon, combined with a small increase in upper lip downward velocity (Figure 
^B), render this interpretation viable. Alternatively, tKe EMG response to 
perturbation in both /baeb/ and /baez/ may. only r^eflect a general stiffening 
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JAWVELOCITY 
A. /baeb/ 
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The consistent reaction of the Jaw to*a constant fcJrce load j; 5. 88 
Newtons, K5 s) applied during closure for the final consonant in 
/baeb/ and /ba&z/. Velocity changes direction abruptly iJi response 
to torque^ The traces are raw data and represent the first eight 
of a set of ten perturbation trials presented randomly in a se- 
quence of ^0 trials* The Yemaining two /traces were very similar 
but are not shown because of a graphics d^sp^ay limitation. 
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DISPLACEMENT 

A and B. Upper lip, lowerT lip (with Jaw movement contribution 
subtriacted out) and jaw displacement for the utterances /baebZ-and 
/baez/. Each trace represents the average of 10 tokens for per- 
turbed (solid line) and control (dotted line) conditions. The 
vertical line in each window marks the onset of torque to the jaw. 
For illustration purposes, the two conditions have been overlaid by 
temporally sliding the control condition, which does not have a 
torque line-up point, relative to the perturbed condition, which 
does, taking the jaw as a reference point. 
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Figure 5. A and B. Average rectified electromyographic activity of upper lip 
(OOS), Ipwer lip (001), and tongue (GG) muscles for perturbed (sol- 
id trace) and control (dotted line) conditions. 



in the upper lip rather than active trajectory control. Further research is 
needed to evaluate these posstbilities. . ' 

In contrast to the upper lip kinematics, the lower lip exhibits compensa- 
tory movement behavior in both /baslf/ and /baez/ utterances (Figures 3 and 4). 
Examination of displacement and velocity profiles reveals a rapid increase in 
lip kinematic values when th6 Jaw iS' perturbed. ' The near-Immediate arid highly 
consistent response of the lower lip to perturbation Is shown for individual 
tokens in Figure 6. The onset delay of the increase in lower lip veloci- 
ty—seen as an inflection point in the closing gesture for /basb/ and as a 
sharp velocity spike in /basz/— Is on the order of 5 to 10 ms. As an 
interesting aside,, the trajectory difference between' the lower lip in /baeb/ 
and /b^z/ before perturbation suggests that the lower lip is not involved 
ordinai^ily in producing /z/ but is Involved in -/b/' production (see also aver- 
aged data in Figures 3 and 4). ^ 

The almost imroedlat'e response of the, lower lip to Jaw loading and bhe 
fact that there are no significant increases in' 001 activity (Figure 5, middle 
row) for either utterance indicate that the lower lip perturbation response' is 
a passive mechanical effect that arises when Jaw motion is abruptly halted. 
In addition, fhe highly stereotypic lower lip reaction to Jaw perturbatipn 
contrasts with other perturbation studies in speech that show considerable 
trial-to-tr^al variability in articulator movements. ,For example, in response 
to a brief perturbation applifed to the, lower lip, AbbS and Grao6(i (1983; in 
press) find rebiprooal trade-oJTfs in amplitude between upper and lower lip 
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LOWER LIP VELOCITY 
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B. /baez/ 




6, A and B. The very r^apid and consistent lower lip reaction, seen as 
an inflection in the velocity trace, to perturbations of the Jaw 
for. /baab/ and /baez/. The plotting comvention Is identical to 
that ^hown in Figure 2w ^ ' ^ * 
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movements as well as in associated muscle activity. In so-called "active 
compensation," different (but systematic) magnitudes of movement ^d EMG 
activity in coupled articulators appear to be the rule (see also HugTr^ & 
Abba, 1976). The stereotypy evident in the preserit lower lip data, however, 
13 more indicative of a passive shearing of the lower lip from the Jaw, aris- 
ing as a consequence of the momentum created by halting Jaw motion. 

One important, feature of the lip closure response to perturbation should 
not be overlooked, namely, that the lips do not meet at the same point in 
space as they do in control conditions. In Figure 3A for example, the arapli-r 
fied response of the lower lip alone (solid line) does not mean that the lower 
lip is more elevated in perturbed than control conditions. In fact, the oppo- 
site is true because the increase in lower lip displacement is smaller than 
the decrease in Jaw height created by loading. Thus, not only is the upper 
lip lower in space in perturbed relative to control conditions, but the lower 
lip is also, t(l8) = 3.20, £ < .01. Wh^t seems Important here is that clo- 
sure, not some spatial target, is achieved, (cf. MacNeilage, 1970, 1980, for~a 
discussion of the status of target theories in speech). 

The passive reaction of the lower lip contrasts with the active compensa- 
tion to Jaw loading evident in tongue muscle activity for Vbaez/. When EMG 
responses from genioglossus are alV^ned and averaged with respect to the onset 
of /z/ frication, the increased amplitude in perturbed trials relative to con- 
trol trials is highly significant, t(l8) - 7.76, p < -001. Again, like the 
lips in /baeb/, the EMG response in /baez'/ is time-locked to the application 
of torque (see Figure 5B) and occurs remarkably quickly (range 20-30 ms). No 
such differences in tongue muscle activLty occur for /baeb/, t(l8) - .88 
p > .10.* 

The pattern of reactions to perturbations of the same magnitude but -of 
much shorter durfition (50 ms) was similar in some respects to those discussed 
above but with some marked differences. Figures 7 and 8 display the kinematic** 
variables of displacement and 'velocity for each articulator and Figure 9 shows 
corresponding EMG data. One difference that is ' immediately apparent is that 
the articulators fo%' both /baeb/ and /baez/ quickly return to their normal 
trajectories following the'offset of the perturbation (compare Figures 3 and 4 
with Figures 7 and 8). In . fact by the time closure is achieved, there .are no 
significant displacement differences between perturbed and control conditions" 
in the upper lip for /baeb/, t(l8|^= 0.1, e_ . 1 . Differences in the ampli- 
tude of muscle activity in \the tongue for /baez/ come close to, but miss, sig- 
nificance, t(l8) » 1.811, £ :> .05. . . 

Thi3 hbmeorhetic property of the articulato'ry trajectories (i!e., a tend- 
ency to return to a "preferred" trajectory) has been observed before in stud- 
ies of human finger (e.g., KeXso & Holt, 1980) and monkey arm movements 
(cf, Bizzi, Chappie, & Hogan, 1982) and hassled to the proposal that trajecto- 
ry is an actively controlll&d variable (Bizzi et al., 1982). However, the pre- 
sent data display lightly damped spring-like behavior; the return to a normal 
jaw trajectory,, for e^Cample, . is preceded ♦by an overshoot response. Thus, 
homeorhesis may arise as consequence of the behavior of a ,dynamtc systeni and 
need not require the assumption of active trajectory control. 

In -summary, though the present findings are preliminary they are 
nevertheless coAsistent wtth coordinative structure the6ry, particularly when 
recent work on speech and other motor activities Is also considered. For 
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Figure 7. A and B. Upper lip, lower lip (w|th Jaw movement contribution 
subtracted out) and jaw displacement for the utterances /baeb/ and 
/baez/. "Each traoe^represertts the average of 10 tokens for per- 
turbed (solid line) and control (dotted line) conditions. The 
vertical line in each window marks the onset of torque to the Jaw. 
In this case a torque of 5.88 N. Is applied for only 50 ms. 
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Figure 9. EMG profiles corresponding to kinematic data' for briefly perturbed 
(solid lines) and control' trials. Each trace is the average of 10 
tokens. 



eJtample, the highly flexible cha/^acter qf the EMG and kinematic patterns" ob- 
served in Experiments } and 2 share a- likeness to recent studies of cat 
locomotion in which adaptive reactions are also evident (cf. Forssberg, 1982, 
for" review). For instance, when light toucVi or a weak electrical shock is ap- 
plied to the paw during the flexion phase of the cycle, an ^rupt withdrawal 
response occurs as if the cat were trying to lift its leg over an obstacle. 
When the same ^stimulus is applied during the stance phase of thie cycle, the 
flexion response' (which would make the animal fall over) is inhibited, and the 
cat responds with added extension - (cf. Forssberg, Grillner, & Rossignol, 
1975). The so-called "stumble corrective reaction" is present in intact and 
spinal animals and, 'like the forms of interarticular cooperation we Tiave ob- 
served^ occurs remarkably quickly. The earliest flexor burst in response, to 3> 
tactile stimulus applied during the swing phase, for example, occurs with a 
latency of 10 ms. Ju£[t as these reactions are non-stereotypic and functional- 
ly suited to the requirements of locomotion, so the patterns obtained in our 
experiments appear to be flexibly tailored to meet phonetic requirements. 

' • ' In a rtnal experiment we attempt to converge on the task-specific nature 
of coordinative structures by asking, in a manner akin to t^e research dis- 
cussed above,, whether the cooperative behavior amOng articulators is sensitive 
to 'the- phase of mptto n during which an unexpected pertur6ation is applied, . 
For example, does perturbing the jaw during the opening phase of the utterance 
/baeb/, induce -a remote reaction in the upper lip? Since the upper lip is 
minimally (if at allj involved in the opening^ vowel--producing phase, we would 
"'not expect to see a remote * response^ in that phase unless the system were 
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rigidly coupled. However, in the cloalng phage *( i . e. the transition out o'f 
the vowel into the final consonant ) where the upper lip is actively involved 
in the closing gesture, the. upper lip should respond to a sudden lowering of 
the Jaw and lower lip. In addition to the question* of refnote reactions, we 
wanted -to examine possibly phase dependent responses in the structures local 
to the perturbation, namely, the lower lip and the jaw its/^lf. 

Ex^periment 3 

Subj ect, Materials, and Procedures . . * 

One subject. Van adult male who was not one of the authors, and who had 
never participated in a perturbation study, took part in this experiment (see 
footnote 3). The speech sample contained two utterance types 'Vbaeb/ again" 
and "/beep/ again. Eighty trials of each utterance were performed in a sin- 
gle block, for a total of 1 60 trials. In each block, 'l2.55t^of the trials were 
perturbed during the opening phase of "jaw motion, and 1 2.55t on the closing 
phase. The jaw was ' perturbed at^ the same predetermined position in both 
phases of the mbtion. As before, a constant force load of 5.89 Newtons and 
lasting 1.5 s was delivered to the jaw via a torque^ motor, attached to a 
custom-made dental prosthesis. Between perturbations, the motor exerted a 30 
g tracking force that did not perceptibly impede or a>ter normal articulation. 

Once again, jaw and upper and lower lip moVement's were optically tracked 
using a modified SELSPOT system. *In addition, EMG- potent ials from OOS and 001 
were obtained from noninvasive surface Cpaint-on) electrodes'. It is important 
to note that the subject knew neither which trials would be perturbed, nor the 
phase of jaw motion that would be loaded. An additional level of ^Uncertainty 
was present, therefore, in this experiment. Movement and*EMG data, and the 
audio signal were recorded for later off-line processing. 

Results and Discussion 



The following analysis of the movement trajectories is based largely on 
differences in pe^ articulator positions between perturbed and control trials 
for opening and closing phases of the respective gestures. First we show that 
the load systematically influenced jaw motion as Intended. Figure 10 shows 
four pair's o'f jaw movement trajectories, corresponding to the four conditions 
examined. Each pair represents the averaged .trajectories for all the per- 
turbed and control trials belonging to that loading phase and phonetic eon- 
text. • During the opening phase of jaw movement, the perturbed trajectories, 

-denoted by the heavier line, rgipidly .diverge downward after load onset. At 
the poi!%t of maximum opening for the vowel, they are much lower, t(1ll) = 4.63 
and t.(17) * ^.59; ps < .00"1 , for /baeb/ and /basp/, rjespectiveiy."^ Note also 
that the Jaw trajectories are still lower at the point 'of peak raising for the 
final consonant, t{]k) = 5-21 (/baeb/) and t^(17)-« 4.26 (/baep/), £S < .OK 
This is perhaps not surprising, Ijecause the load remains on for 1.5 s. When 
the load is applied during the closing phase of motion, the jaw trajectories, 

^as expected, are not dif;Terent at peak Jaw lowering for either Vbaeb/, t.(12) ^ 
-•20 or /baep/, _t(l8) - -1.73, £3 > .10. Following load onset, however, the 
trajectories again diverge, and , the loaded Jaw remains »nuich lower, at stop clo-^ 
sure in both phonetic contexts, t^(12) » 8.69, p < .01 * fpr ./i)asb/ and t(l8) = 
5.23, £ < -01 for /baep/. It is clear, therefore, that load application in 
.both pljases of the motion had the intended effect on the j^w trajectories.' 
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e 10. Four pairs of jaw movement trajectories corresponding to the four 
experimental conditions examined. The thin lines are the' average 
unperturbed^, j3ontrol trials. Thick lines represent the mean "per- 
turbed trajectories. 
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In Figures 11 and 12, we show the extent" to which "local'' reactions occur 
In the lower lip in response Vo Jaw perturbation for the utterances /baeb/ 
(Figure 11) and /baep/ (Figure 12).-Aln-the figures, lowei% lip po-s'itioh is 
shown in absolute space as it rides the Jaw,^the'LLJ traces) and with the Jaw 
motion subtracted out (the LL traces). The traces along the- bottom of the 
figure are .averaged,', but unsmoothed, signals for a^ lower lip muscle (001), 
which is active for bilabial closure. Stippled -portions, denote increased mus-^ 
cle activity in perturbed (the thicker liile) relat'ive to con-trol' trials. 

Like the Jaw, the lower lip-jaW complex -shows a reaction to the Jaw. load 
during the opening phase of motion. Measured at maximum lowering, LLJ is per- 
turbed dpwnward in both /baeb/, t(l4)*= 6.03 and Vbasp/, t(17) » 5.96, £S < 
.01. Again^ since the load remains on, the lower lip-Jaw combination remains 
lower 9t the point ^f peak "closure on perturbed trials, t{^^) « 3-71, £ < .01 
(/baeb/) 'and t(17,^ = ^.75, p < .01 (/baep/).' When the J^w is loaded during^, 
th^e closing phase of motion, we see a difference between perturbed and aoptrol 
LLJ traces only at the point of peak plosure, Jt<1 2) = 6 .08 , p < .0^ for /baeb/ 
and t(l8).» 5.38, £ < .01, for /bagp/. 'A3 expected, the trajectories are not 
slgniPtxjantly different at peai< dowering, i.e., before the load is applied, 
^(12) » -.il7 and t(l8) = -1.55, £3 > .10 for /baeb/ and /b'aep/, respectively. 

In Figures 11 and 12 we show also the lower lip alone (LL) responses to 
p^turbation . in the opening phase. Independently of Jaw lowering, the lj.p 
traces diverge rapidly after load onset and are reliably lower at peak opening 
for the vowel after jjaw loading in l)jDth /baeb/,' tX^^) « 5.55 and /baep/, t(17) 
« 6.00, jps < .01. A marked increase in orbicularis inferior activity 
accompanies the lower iip response^ A conservative estimate of t,he mean la-- 
*tency in 001 is 20 ms, with a 15t35 ms range. Although the mean lower lip 
position ,( relative to control) is not as high at closure in conditjlons when 
the Jaw is loaded during the opening phase, the effect »i3 highly variable and 
nonsignificant, 't(1 ^) » -1.06, ^^j^ .10 for. /baeb/ and t(17) =--1.31, P> .10 
for /baep/.*. . ^ 

On the right hand side of Figures "11 and 1^ -is shown the av^eraige lower 
lip response to perturbations ilpplied during the closing, phase of Jaw motion. 
The peak closure displacements are not different betweeTl perturbed and control 
trials for^either /baeb/, t(-12) =» -1.2>»i^^ > .10 or /baep/, -^(1 7) - .53, P > 
.10, suggesting that the lower lip has ^completely - compensated for ^e lower 
Jay positiqn. Again, there is, a noticeable 001 reaction some 30 ms on the 
average after load onset, although this may in part reflect overall stiffening 
of the loWer lip (note the generally elevated postu're of the 'lower lip after 
peak closure has occunred). A3 expected, the lip trajectories are not differ- 
ent prior to load onset, that is, at peajjjf, lower lip depression, t{T2) ^ -.79, 
2 > .10 for /baeb/ and t,(l8) =» .8'6, £ > .10 for /baep/. 

Local movement and* EMG reactions occur in- response to Jaw perturbations 
that are introduced in both- opening and closing phases of the gestures. The 
very pronounced 001 activity when the load occurs during the opening phase of 
jaw motion may be indicative of the-* upcoming requirement of lip closure. 
Since the mean lower lipXposition (independent of Jaw movement), is lower as a 
result of. the pepturlV^ion, - it must rnove further and more rapidly to 
contribute to" bilabial closure. Hence an increase in muse^le activity is' not 
surprising. The active, changes* in lower ^lip musolie activity in this subject 
contrast with the passive' "shearing" effects exhibited'by a differ^ent subject 
in Experiment 2 (and possibly in Experiihent 1 as well). Note -that the form of 
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A^verage lower lip plus Jaw <LLJ) and lower lipi alone (LL) trajec- 
tories for the utterance /baeb/ under perturbed ( thick line/ and 
control conditions. 001 is. the rectified ^^wJk^veraged, but > 
unsmoOthed electromyographic response of a lower lip raising mus-- 
cle, orbicularis oris inferitir. The thlcKer line denotes per- 
turbed responses. * ^ 

' Vbaep/ 

OPENING PHASE : ' ^ CLOSING F^HASE 



LOAD 



LOAD 



LLJ 



LL 



001 




10 mm 



10 mm 




r 



150 uV 




JR?C 



Figure 12.. Average lower" l-ip plus Jaw (LLJ) and lower" lip alone (LL) trajec- 
tories' for the utterance /baep/ undej? perturbed (tjhick .line) and 
control conditions.. Otfl is the rectified and averaged, but 
unamoothed electromyograpliic response of a lower lip raising mus- 
cle, orbicularis oris Inferior* The thid^er^^line denotes per- 
turbed responses. 
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fdr the utterance /baep/ under perturbed (thick line) and control 
conditions* 'OOS- is the rectified stnd averaged, Jbut unsmootheid, 
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the Jaw trajectories in* the same phonetic context (/baeb/) is also drjiiric^ical - 
ly different betweea subjects^ For the first subject' the Jaw was 'essential ly 
halted by a load applied during the raising trajectory (see 'Figure 3)/ For 
the subject in this experiment, the load had more of a resistiv-e effect on the 
t Jaw^ trajectory. These between-subject dlffer^yices in Jaw trajectory in reac-, 
tion to a load may influence the sxient to whtch a 'structure linked to the Jaw 

• (^he lower lip) actively part ic ipat^es, A sudden halting of the Jaw may cause 
a^^ shearing response in the lower li^, whereasx^ reduction in the magnitude of 
the load or^ a stronger Jaw re^tion to the Ipad may be associated with a more 
active neuromuscular respofis^^n locally 1 fnked artlcutators. A systematic 

• manipulation of load magnitude could hetp aeSolve this question. ' 

■■ • , 

Although we do not expect 'the paJbterns of cooperation among articulators 

to be. identical ainong subjects, we do predict'" (provided anatomical limitations 
have not been violated) that the int^egr ity -'of the phonetic act will be pre- 
served. What then of phase-dependent remote effects? , In Figures 13 an(il we 
display the upper lip movement and EMG traces for perturbed an<3 control trials 
of /baeb/ (Figure 13) and /baep/ (Figure T^). To aid comparison, the lower 
lip plus Jaw trajectories are also. shOwn. When the perturbation was applied 
during the* opening phase, the upper .lip trajectories were variabj^e and no dif- 
fepept from control when irreasured at. the peak raising point, ^(1^,) p > 

.10 for /basb/ and t{M) » 1.70, p > .10 (or /baep/. However, in opening 
•phase perturbation trials, the upper lip does lower fupther on perturbed as 
- com)3ared to control trials when lip position i-s measured at peak clo'stire, 
t(H) f 3.65, jp < .01 (/b^b/) and t ( 1 7.) « 3.51, £ < .ofl. ^Presumably this oc- 
curs to accommodate the reduction in lower lip-Jaw height\ - ^ ' 

When the load .was applied during closure, there was again a significant 
upper lip lowering response for both /baeb/, _^(1-2)/* 2.77, p < .01 and /baep/, 
_t(l8) 2.68, p < .02, but no differences earlier in the trajectory at the 
point of the peak' raising movemerTt, t(12) - 1.22 and t^Cl8) - -1.32, 2^ >: .10 
for /baeb/ and /baeb/„ respectively. 

In general, though the upper lip muscle recordings are good, clear 
differences between perturbed and contrQl trials were not readily discernible 
in either timiftg or magnitude. For this subject, at least, OilS- muscle acti^va- 
tion may be sofficient to generate upper lip motion unti^l a collision with the 
lower lip occurs. 9n shor^t, there may be no necessary requirement for a fine- 
ly modulated EMG response in upper lip since bilabial consonants are charac- 
terized by fixed boundary conditrir?i^3. ^ ^# 

• ^ * 

General Discussion * 

. ^ Ev.en simple speech gestures involve cooperat ion among very , many degrees 
of freedom 'operating at respiratory, -'laryn^al, and sufiralaryngeal levels. 

• Ber^nstetn t1^67> hypotheslx^ tl^at rat-her— than^ controlling each degree of 
freedom se^rately, the ; central" nervous system JCollects mult^le^ deg^pees .of 
f ree*dom together into functional synergies or coord i native stf^uctures that 

'then ^ehave, from the perspective of control, as. a single' unit . The present 
research addresses Bernstein's hypothesis in an effort to identify an^ analyze 
coordin^tive structures in ^^peech. In this regard, it contrasts with much 
other work on motor control whose "focus is restricted .to actions of a single 
•Joint (see Stein, 1982, for many examples). 
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The hallmark of a coordinative structure as we define U (see also 
Boylls, 197'j;. Fowler , >977 ; . Kelso & Holt, 198a; Kel5£ui.& ^ Sal Lznian,, 1982;^ 
Kelso et al., 1979; 'Juigler, Kelso, • & Turvey,- 1980; Nashner et*«l., 1979;' 
Turvey, 1977) is the? tempoLrary marshal 1 Ing of many -degre es of . freedom IntO ra • 
ta''3k-3peclf ic. fungtional unit . This definition should not be confuted with 
the traditionaf, ref lex-Jjased usage of synergy elaborated, fop "^x^mple, by. 
Easton (1972). As Szentagothai^ and Arbib (197^) ha>?e pointed out, sudl» use of 
the tern) "'...Is too resj^ricti ve ^to capture the^oncepts" (p. 165). Partly in' 
response to these authors' request for "...a redef inition^ of synergies' to 
revitalize motor systems research"' ( SzenUagothai & Arbib,* 197^, p. 165) we 
have provided a recent elaboration ^crf coordinative Str^uotures in terms • of 
the'ir neurophysiological and behavioral ."manifestations. (Kelso, TuUer, & 
Harris, 'l98l/198-3; Kelso & TuUer, 1983/19814). . ^ v , . 

,^ • The "task-specificity hypothesized by coordinative structure theory l,s 
supported -in the present experiments. For boUi /^/ and /z/, rapid and highly 
distinctive* p^itterns of the upper lip, lower lip, ami tongue occurred in re- 
sponse to unexpected Jaw loadings so that the desired sound was produced.- In 
all cases, the adjustments, though varied* were such as to preserve the 
•^integrity of the phonetic act. For example', for /-z/ frication in Experiments 
1 and 2, there was no detectable upper' lip movement. But, *ince the Jaw-was 
much lower than usu|&l, highly amplified tongue muscle activity, necessary to 
obtain an appropriate alveolar position for frici&tive production, was observed. 
Like the lips in /baeb/, the tongue in /basz/ responded remarkably ^quickly 
the very first perturbation trial and again with no slurring ' or dl3torti(» 
perceptible to a listener. As in recent studies of bite-block speech (al<in to 
'speaking with a pipe in one's mouth), in which sensory information was drasti- 
cally reduced by anesthetization of oral structures combined with auditory 
masking, we found no evidence of ^y short-term "learning" (cf. Kelso & Tull- 
er, 1983). Articulatory "compensation" was achieved, therefore,' with little 
or no practice. ' V ^ 

The' coordinative structure account applies equally well to disruptions 
that are static and anticipated (like the bite-bloc^: experiments') And those 
that are time-varying ^nd unanticipated. ^ Adjustm(9nt to either of pertur- 

bation is a predictable outcome of an ensemble whose constituent muscles func- 
tion cooperatively as a single^ unit. If ^ the operation of certain variables is 
fixed, „^ as in the bite-block case, or unexpectedly disturbed as a result of 
on-line perturbation, functionally linked variables will ^reaerve the 
synergistic constraint. As we hAve emphasized before (Kelso 4Jg^ler, 1983: 
see alBO Abbs & Gracco» 1983) so-called "compensation" is chara?eeristic of 
, the speech system's normal ^ mode of operation. For example, iji "a study of 
' respiratory function during speech, Hixon, Mead, and Goldman (1976) found that 
the relative contributions of thorax and abdomen movements adjust in order to 
presMP^e—aKJbglottaJ pressure level across large postural changes (e.g., lying 
"versus standing). Similarly, Sussman, MacNeilage, and , Hanson ( 1973) in a 
study of lip and Jaw ■ movements i^a variety of vowel-consonant-vowel (VpV) 
triads observed that Jftw elevation at consonant closure wa? directly propor- 
tional to the height of the following Vowel. Thus, ih, order to occlude the 
vocal tract for 7p/ in /aepae/ versus /aepi/ th^ lips must "compensate" 
differentially to accommodate jdifferent Jaw positions. Both of these studies 
suggest task-specif ic -cooperation in naturally occurring situations^ 
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♦ 

'One account Of multimovemont. adjustments to unanticipated -'disruptions 
posits a closed-loop peripheral feedback mecj^hanisra (cf* Abbs. 1979; Kolkina fi 
Abbs, ^91'-J). A3 we ha^e. pointed' out,* howover^ (Fowler & Turvey, 1978, 1980; 

. Keli^o, 1981; Ke 1 so /fuller 1 983 ) , a closed-loop system, though capable, in 
theory pV detecting and 6orrecting "errors" in the perturbed structure, ha:^ no 
mechanism for producing adaptive frtOvements in remote and non-'biomechanidally 
linked articulators. Pecaus^ of this limitation. Abbs and feracco (1983) have 
recently proposed an "open-loop adjustment process" to account Tor upper lip 
changes to louver lip pertHjrbat ions' "... based upon a pr^-e^tabl isnVd sensorimo- 
tor translation between^ lowepylip afferent signals an^ upper 'lip motor ac- 
tions." This notion is si-^ilar to the predictive, feedforward processes hy- 

• pothesized by- Ito (197^)) for vest ibular-ocular inter^^ions during eye-head 
movement, and elaborated jnore rec^tly by Houk and. Rymer (198l). - Viable 
though feedforward may be, it is nev^emtheless difficult to envisage how — wi th - 
out the concept of coordinative Structure — ^all the computation could be 

^m^e-establlshed in such a way that the Tips, jaw, and tongue (not to mention 
otlher possible articulators not obs.erved in these experiments) perform 
precisely those movements .ihat meet the speaker/ s ^object ive. The problem is 
exacerbated when unexpected challenges are introduced whose dimerfSiotis ( e. g. , 
maglaitude, duration, site) are potentially manifold. However, arW^ough \the 
particular neural processes involved await clarification, a central conclusion 
of -Abbs and ^old^agUes' work, that the ^-...nervous system prioritizes acousti- 

* Cally and aerodynamically significant multiaction gestures over individual 
movements and muscle actions.." and* that "...these sensorimotor capabilities 
relieve the nervous system of, having to prespecify the motor details" (Abbs, 

, in press) has, much in common with the* jponcept of coordinative structure 

, advocated here and Elsewhere. • ^ 

The 'results of the third experiment provide further evidence for a 
task-specific coordinative structure style t)f motor contrpl. ' ^Remote responses 
in upper lip were found to^be phase-dependent; that is, the'y occurred only 
. when they were functionally ^appropriate. 'Similar task-dependent forms of 
articulator cooperation' haye been observed in recent studies of posture in hu- 
mans (e.g., Gordo & Nashner, 1982; Marsden, Merton, & Morton, 198U 1983). 
For example, Marsden et al. (1983) applied a small perturbation to the thumb 
of , a standing subject as he was performing a thurcb tracking task, a^nd observed 
reactionj^ in muscles i^emote from the prime mover (e.g., in pectoralis major; 
in the triceps of the opposite limb when it gripped S table top; in. the oppo- 
site thumb when it served to stabilize motion, etc.). These distapt reactions 
were very rapid (e.g., 40 ms in pectoralis), sometimes faster than the/local 
aul^ogenetic response in the structure pertjarbed'. Though exquisitely sensitive 
they are not caiised by length changes in the postural muscles themselves. 
Perturbations of only 7.5 g to the thunjb or wrist, often not even detected by 
the subject^* were associated with brisk, distant reactions. Finally and im- 
portantly, •distant reactions occurred only when thfl^ performed a useful func- 
tion and they Were flexibly tuned to that function. Postural respons^es in 
triceps^ 'disappeared if the hand was Qot exerti^ng a firm grip on the object. 
If, in3tead of holding a table top, th^". non-tracking hand held a cup of tea, 
the responses in tri\eps reversed , wtiich is exactly what they have to do to 
prevent the t^si. from Vpilling. Marsden et al. (1983) conclude that these rap-^ 
Id, remote effects "/.-.constitute a ^distinct and apparent]^ new, class of mo- 
tor reaction" (p. 645) that has. caused them to^ aband6n an account baaed on 
stretch reflexes. As the previous discussi'on indicates, however, similar 
phenomena have H)een 'present (although perhaps not sufficiewtly recognized un- 
til recently) in t\ja speech liTerature as well. 
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To summarize, though the speed of the^daptive reactions observed in our 
experiments could be described as reflexi^!??; their mutability speaks against 
any fixed reflex connections or rigidly constructed servomechanismsn. Thus, 
the system we are dealing with appears to ^'s^tly". assembled and ^tf ^ex ible 
in function; not machine-like and rigid (Iberall, 1978; *see also •Abbs & 
Gracco, 1983)» Similarly, it is' extremely doubtful that the articulatory pat-- 
terns observed here in respfonse to Jaw loading at different phases qT motion 
and in different phonetic contexts are programmed completely in advance. 

The present data, preliminary though they are, suggest nevertheless that' 
the mode of operation of the speech system /is intrinsically tasl^-orieoted, and 
*that both rapid local and Vemote articulatory -contrib^itions are involved in 
the^ implementation of cooperative actipn.^^t most importantly, the adjust- 
ments appear to reflect a synergistic organization among articulators that is 
tailored to the requirements of the spoken act. As Bernstein (1967, p. 69) 
intimated: j . 

Movements^ react to one aingle detail with changes in a whole series 
, of others that are sometimes very far ^rom the former both in space 
and time ... In this way moyeme(its are not chains of details but 
structures- which are differentiated into details 



Or considerrDewey*3 (1 896 : ^ci-ted l^i'n Fearing, 1930) remarks that the relations 
between sensory stimuli and motor'NionsequeAces do not constitute a "fixed e)c--, 
istence""but a *'flexil?le function".^ Herein lie kernel themes for a research 
program on coordinative structures that differs radically from approaches that, 
focus on control .around a sihgle Joipt. The present work represents only a 
modest, but we think jiromising, beginning., . ^ . 
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Footnotes x 

^Initially Folkins and AbbS*(1975, p. ,218) interpreted their data as sup- 
port for online feedback processing, that is, "a lip control system that is 
adjusted on the basis of "feedback information about the relative position of 
the lips and Jaw." A more recent interpretation, or perhaps a redescription 
by Abbs and Cole (1982, p. 171) is that the data support "a feedforward, 
open-loop control process..." in- which "...information is fed forward for mak- 
ing adjustments in motor commands to structures having parallel involvements." 
Suprabulbar pathways are hypothesized to play a mediating role. 

^Anecdotal evidence for such tailoring is reported by Abbs and Gracco. 
(1983), who noticed thaU upper lip compensation to a lower lip perturbation 
occurs in the utterance ft^aba/ but not in /afa/. Neither data nor reference 
citation to this finding is presented, however. Similarly, Folkins and Zim- 
mermann (1982, p. 1232) conclude th6ir paper on electrical stimulation of the 
lower lip with the suggestion that "...it may be that interactions between the 
lips and jaw ma;^ be different for bilabial closing, bilabial opening, 
labiodental closing, and lip rounding gestures." (italics ours). Again, a di- 
rect test of this hypothesis, which we conduct here, has not been made. In 
fact, all the dynamic perturbation studies conducted thus far have involved 
bilabial gestures. ^ 

^Some exfilanation is necessary about the small number of subjects and the 
chronological aspects, of the res43arch. Since these e^cpeciments started in 
late 1978 we have tried to prepare a total of four subjects for participation. 
In each case special dental casts were made of the upper and lower teeth, pri- 
or to constructing a titanium prosthetics for the lower jaw. Only with two 
subjects, however, was it possible to proceed according to plan for the 
following reasons. First, in order to seat the prosthesis in the mouth firmly 
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so that It did not come Qyt or reverberate when a load was applied, it was 
necessary to have a subject who had at least one (preferalDly several) of the 
rear molars missing (§ee Figure IC). Second/ and rel^tedly, it was crucial to 
have sufficient clearance at the sides of the subject's mouth so that the 
protruding rods to the torque motor did not interfere in 'any way with "the sub- 
ject's speech. Two subjects met these criteria, though the second subject did 
not become available until early 1983. We tried to test him in the larger 
version of Experiment 1 , but he was unable to withstand the Insertion of fine 
wire eledtrodea into the tongueNand hence could not be used to study fricative 
production. Because of these difficulties we tan report only our efforts to 
provide a within-subject replication of the experiment (Experiment 2) . The 
.second subject,' however, participated in Experiment 3, which did not require 
invasive procedures. We did not run subject 1 in th^ latter study because we 
were concernetJ about possible experiential factors Influencing the results. 

"•Reak lip displacement can occur after closure is attained because of the 
Elastic nature of the lips. Once the upper and lower lips touch, achieving 
closure, they caa and usually do compress further as closure proceeds. 

'The large burst of "genioglossus activity evident in /baeb/ and also the 
second peak in /baez/ is related to production of the /g/ in the carrier 
phrase "again." Examination of the acoustics revealed that the torque oc- 
curred closer to the onset of /b/ closure than tp /z/ frication. This is 
reflected in .the proximity of genioglossus activity to torque onset in /basb/ 
relative to /baez/. 

•in the following analyses, there are always ten control trials to 
compare with the pehturbed trajectories. However, because of technical 
difficulties (e.g., the subject making non-speech jaw movements that triggered 
the perturbation), there are not always ten perturbed trials. We present 
therefore the pooled degrees of freedom (N-2) for statistical tests, although 
we have performed all the tests using the adjusted degrees of freedom (after 
Scheffe) as well. Pooled and adjusted results are very similar; however, 
where they diverge we will report both. 

^Fearing's (1930) book is a most scholarly treatment of the reflex con- 
cept in psychology and phy'siolqgy. Given recent findings (see General Discus - 
sion ) the book has a prpphetic tone. For example, Dewey's remarks made in 
l896, offer a stark contrast with Sherrington's in 1906. Sherrington on the 
one hand admitted that the reflex was a "likely if not probable fiction," but 
on the other referred to it as having "a machine-like fatality" (cited in 
Fearing^ 1930, Chapter 16). Fearing's conclusion ( pp. 313-315), in which he 
advocateis an experimental- approach that does not focus on isolable fragments 
of an action, but rather examines the relations among .concomitant events in 
the integrated nervous system, is anticipatory of some, but by no means all, 
current work on motor control. 
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FORMANT INXEGRATION AND THE PERCEPTION OF NASAL VOWEL HKIGHT* 

y 

Patri'ce Speeter Beddor * * «^ 4 



Abstract . Research on oral vowels has shown that vowel perception 
involves integration of adjacent spectral components such that per- 
ceived height correlates with the center of the first, region of 
spectral promineVice or "center of gravity". This study investigated 
the center-of -gravity effect in nasal vowels and' asked whether fbr- 
mant integration in vowel perception extends to the first oral for- 
oiant, F1, and the, first nasal formant, FN. Five nasal vowels, [1 S 
a a S], were synthesized. For each nasal vowel, a continuum of 
synthetic oral vowels was generated by manipulating the frequertcy of 
F1. Five vowel sets were constructed by pairing the nasal vowel 
standard with each member of the corresponding oral vowel continuum; 
listeiters selected the "best-match" pair for each set.' Listeners 
chose the oral-nasal pairs with the same F1 frequency in vowel set i^ 
only. For e, ae, a,, and o, listeners' matches depended on the rela- 
tive position of F1 and FN in the nasal vowel: when FN frequency 
was less than F1 , as in [a] and [a] , the best oral match jiad^a, rela- 
tively low FT frequency; when FN frequency exceeded F1 , Ss ih [5] 
and [3], the oral match had a high F1 . These perceptual data indi- 
cate spectral averaging of adjacent oral and nasal vowel formants, 
thereby demonstrating the center-of-gravity effect in the perception 
of nasal Vowels. 



This paper reports the results" of a study of the acoustic features deter- 
mining perceived height l,n nasal vowels. Most previous research of the 
perception of vowel height hc|,s dealt with oral vowels. Phoneticians generally 
acknowledge that the perceptual dimension of height in .oral vowels is inverse- 
ly eorrelated with the frequency of the first forrpant, such that height 
perceptually lowers" as first formant frequency increases (Fant, i960; Joos, 
19^8; Ladefoged, 1982; Peterson & Barney, 1952). But despite this correla- 
tion, the frequency of the. first formant is not the sole determinant of per- 
ceived vowel height. ^ 



*A shorter version of this paper was presented at the Annual Meeting of the 
Linguistic Society of America In Mihneapolis on December 30, 1983. 
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Ex'perimental evidence indicates that trfie first formant does not 
acoustically specify height in oral vowels when th^ frequencies of the fv^t 
two vowel fjormants" are relatively cl;ase together, as in back vovels. Stajies 
with* synthetic vowels have shown that perceived height in back vowels is 
determined not only by thp first formant (F1), but also by the second formant 
(F2j. In experiments where one formant vpwel approximations wer^ perceptually 
matched 'to two-formant back vowel stimuli'," the frequency of ' the "single formant 
was not matched to Fl or F2 of the -two-formant stimulus, but was instead 
located between. Fl and F2 (Bedrov, Chistovich, & Sheikin, 1978). Similarlf; 
Delattre, Liberman, Cooper, and Gerstman (1952) found that reduction in- Fl 
amplitude perceptually lowered b^ck (but not f'ront) vowels, while reduction in 
F2 amplitude perceptually raised back vowe]^, leading to the speculation^ tbat 
"the ear effectively averages two vowel fprmants which are close togetl^r" 
(1952, p. 203). ' 

Perceptual averaging of vowel spectrum components that are relatively 
close in frequency is not restricted to Fl and F2, but also occurs for F2 and, 
F3 (Bladon & Fant, 1978; Carlson, Fant, & GranstrOm, 1975;^ Carlson, 
GranstrOm, & Fant, l970^;Miller, 1953) as well as for the first hahmonic and 
Fl (Carlson, Fant & GranstrOm, 1975; Fuji'saki & Kawashima, 1968; 
TraunmUller, 1981). A substantial body of data therefore indicates that 
perception of vowel quality involves calculation of a weight-ed meap of 
adjacent spectral prominences rather than merely extraction of ^the frequencies 
of the spectral peaks. Ttiat ^isj when two spectral prominences fall within 
some critical frequency range, vowel quality is determined by the "center of 
gravity" of the region of prominence (Chistovi-ph & Lublinskaya, 19«79; 
Chistovich, Sheikin, & Lubli.nskaya, 1979). The center-of-gravity effect 
disappears when the distance between spectral peaks exceeds 3.0 tb 3.5 Bark 
(Chistovich & Lublinskaya, 1 979i; . Syrdal & Gopal, 1983|).» 

This study extends investigation of the center of gravity effect 16 nasal 
vowels.* The acoustic theory of vowel nasalization predicts; that 
-velopharyngeal coupling of the nasal tract to the main vocal tract- adds 
pole-zero pairs and shifts formant frequencies of the transfer function of the 
coupled system (i.e., nasal vowel) Relative to the transfer function of the 
uncoupled (non-nasal) system. Especially important to this st^udy o*<y»asal 
-vowel he;ight is that the main acoustic effect of nasal coupling is in the 
region of Fl, where Fl of the non-nasal vowel is replaced in the nasal vowel 
by two poles and a zero (Fant, I960; Fujimura & Lindqvist, 1971 { Hamada, 
1983; Stevens', Fant, & Hawkins, in press). The two poles are the first nasal 
formant and the first oral formant, the -latter typically being' shifted in 
frequency, with a wider bandwidth and low^r amplitude than the first formant 
of the qon-nasal vowel (Delattce, 195{J; House & .Stevens, 1956; Mrayati, 
1975). ' Thus the low-frequency region of nasal vowel spectra is characterized 
by a relatively flat, wide distribution of acoustic energy (see Maeda, 1982). 
Some of these spectral properties of nasal vowels are illustrated in Figure 1 
by the spectrum of a Hindi speaker's nasal [S] (solid curve), superimposed on 
the spectrum of Hindi oral [e] (dashed curve). Note ^that the low-frequency 
^ectral eriergy' of [S] is spread across two broad spectral .prominences while 
te] has a single narrow low-frequency spectral peak. 

The present study asks if formant averaging in vowel perception 
generalizes to adjacent oral and nasal' vowel formants. Our pUrpose was to 
determine whetfler the., perception of.- height in nasal vowels involves spectral 
integrati(jn oi^ the first oral formant, Flj'and- the first nasal formant, FN. 
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Figure 1. LPC opectra of .nasal [S] ^i^olid curve) and oral [e] (dashed curve) 
produced by a Hindi speaker. The nasal vowel spectrum has two 
broad spectral prominences in the low-frequency region while the 
oral vowel spectrum has a single narrow low-frequency spectral 
peak. ' . ' 

/ w • — - • — - ~- * . ... 

The oral vowel studies reviewed above might lead us to expect Fl-FN averaging 
since the distance between F1 and FN in many nasal vowels Is less than 3.5 
Bark, i.e., less than the critical distance found for spectral integration of 
oral vowel components. (For example, the distance between the fip'st two spec- 
tral peaks of nasal in Figure 1 is roughly 2.8 Bark.) Previous nasal vowel 
research also points toward possible F1 -FN integration. Joos (19^8) suggested 
that French /e/ sounded like [SJ because the^ average frequency of F1 and FN in 
nasal /E/ corresponds to Fl in oral /«/. Similarly, Fant (I960) and Wright 
(1980) speculated that shifts in perceived vowel height accompanying nasal 
coupling might be due to the additi6nal low-frequency nasal resonance. 

Method 

r . ' 

Stimulus Materials 

» > • 

The stimulus materials were five sets of nasal and oral vowels generated 
on the Hagkins serial software formant synthesizer. Each 360-ms stimulus 
consisted of steady-state vowel formants, with fundamental frequen9y and am- 
plitude decreasing over the final 120 ms. 

The five nasal vowel* stimuli, [I 5 §& a 5], were synthesized by adding a 
pole-zero pair In the vicinity of the first pole to the Ifive-pole transfer 
function for an .^pral vowel. The spectral characteristics of the synthetic na- 
sal vowels were based on ,FFT,and LPC analyses of natural vowel tokens from 
several languages (Beddor, 1983). Autoregressi ve' LPC spectra of the synthe- 
sized nasal vowels aj^e shown in Figure 2,» along with the measured frequencies 
of. the first two spectral peaks. The labels assigned to these peaks are to be 
interpreted with caution, since identifying the "first oral formant" and^ the 
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Figure 2. Spectra of ther five synthetic nasal vowel stimuli. 
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extra "nasal formant" of nasal vowels is a .terminological problem (see 
St'evens, Fant, & Hawkins, in press). The convention adopted liere 'is to label 
"FV'-the first peak in the high and mid nasal vowel|, [1], [I], and [3], 

. and the second peak in the low nasal vowels, [a&] and [a] (these "Fl" values 
) bein^close to typicaft Fl frequencies for the oral vowels [i], [e], [o], [ae] , 
and [a]). . That is, Fl . frequenc]^ was less than F^^ frequency in high and mid 
vowels and greater than FN frequency in low vowels, which is consistent with 
the acoustic theory c^r vowel nasalization (Fant, i960; Fujlmura an'd Lind- 
qvist, 1971) as well as previous analyses of natural nasal vowel tjokens {e.g., 
Fujimura, 1961; Wright, 1980j|. The added zero was set between the first oral 

. pole and the additional pole for all nasal! ^fdwels except Jftigh [1], where the 
zero separated the additional pole and the second oral pple (see Fujimura, 
1961; Maeda, 1982). ■ - 

For each nasal vowel, a continuum of pr.al vowels was constructed by omit- 
ting the extVa pole-zero pair. '. Within' each oral continuum^ stimuli were 
identical to^each other except for the frequency of Fl, which was systerlteti-- 
cally varied as shown in Table 1. Fl step-size in each continuum was approxi-- 
mately 10$ of the average Fl frequency for that vowel set!; (Thus step sizes 
were larger for lower vowels, e.g., Fl step-size was 32 Ha ifor i^, il5 Hz for e, 
and 60 Hz for ».) The Fl range of each oral continuum included two vowels of 
special interest. One of these* oral vowels was an "Fl match": the frequency 
of its firit formant was the same as the Fl frequency of the correspot^i ng pa- 
'sal vowel, (This can be seen by comparing the oral vowel Fl values designated 
by * in Tabl^^ 1 with the nasal vOwel Fl values in Figure 2.^) A second oral 
vowel from each of the five series was a "centroid match", in Table 1);^ 

this stimulus yiatched the corresponding vowel dn a specific measu^ 
of gravity. . ' 

The centroid of a vowel is^^i^measure of th6 center of gravity calculated 
from t^efc,^^i(^PC speetrum of that vower. The centroid (CeN) function computes the 
mean fr^^p^cy or the area under ^wie spectral^ curve within specified frequency 
and magnitude ranges according to^-the formula 



Where X =- frequency (Hz) and Y « log magnitude (dB). .Figure 3 demonstrates 
the operation of the centroid function for nasal [5]!* The left and right 
vertical bars delimit, the frequency range of 100-1100 Hz and the connecting 
horizontal bar sets the lower magnitude ^limit. The spectral curve forms the 
upper magnitude' limit. The. center frequency or centroid of this area, 526 Hz, 
is shown by the ' dashed vertical line. The frequency an^ magnitude ranged 
selected in this study were based on afialyses of over 800 natural-speech to- 
kens of oral and nasal vowels (see Beddor , 1 983 , for discussion of these 
ranges). * The frequency range of 100-1100 Hz was used for all vowel stimuli 
except for the low central vowels, for which the upper limit was eljfended to 
IMOO Hz.** The lower magnitude limit was determine,d separately for e$ich stimu- 
lus and was set Just below the lowest point in the 100-1 100 (or I^IOO) Hz por- 
tion of the spectral curve. ^The area measured by the centroid function 
included Fl in all vowels, but also FN in the nasal vowels and F2 in the 
non-front (oral and nasal) vowels. • 
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( F1 values (in Hz) for the oral vowel sets, 
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Figure 3. Illuatration of the centroid function using the raid front nasal 
vowel stimulus; [S], The figure, indicates the .region of the spec- 
trum analyzed by the centroid function: the vertical bars delimit 
the 100-1100 Hz frequency range, the horizontal bar sets the lower 
magnitude' limit, and the spectral curve forms the upper magnitude 
limit. The dashed line markte the center frequency or centroid of 
tijis region. V 
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Figure ^ compares the stimuli desjgna't^d "p^i match" and "centroid match" 
f,rom vowel set as. In the, upper panel, the F1 match, we see that the frequency 
of the first peak in the oral vowel spettrum '(eteshed curve) and the frequency 
of the second peak in 'the nasal vpwel spectrum (solid curve) are the same.- In 
contrast, in the centroid match in the lower panel,\irhe first peak in the .oral 
vowel straddles the two low-frequency peaks of the nasal vowel; while these 
^ two spectra share no peak frequency in the fir.st' region of spectral 
prominence, the center frequency of ahis" region is the same in the two .spec- 
tra. " . . ^ > 

I 

Figure ^ also, shows, that, in vowel set », the oral .vowel of^'th^ cen- 
trold-matched pair has a lower F1 frequency than . the oral vowel' of the 
Fl-matched pair. This is also tcue'of the low vowel set a, as indicated by 
the values in Table 1. In contr^iSt, in the npn-low . vowel sets, i^, e, and o, 
the centroid match has ' a higher' frequency * than \he F1 match. This is due 
to the location of the f,i.rSt n^iS^tormant • relative to the first oral form^^^^ 
in the nasal vowels (see Figur^'^Vy when FN frequency is les^ than F1 f 
quency, as in low nasal vowels, FN pulls down the center of gravity; when FN 
is greater than F1 , as in high and mid nasal vowels, FN pulls up the center of 
gravity. 

Subjects 

Twenty paid student volunteers participated in the experiment. All were 
native speakers of American English with no known hearing loss and no exper- 
tise in phonetics. . Although several of .the subjects had studied a language in 
which the oral-nasal contrast in vowels is distinctive (e.^. , French, Polish), 
this background had no apparent effect on their results. 

Procedure 

Test sequences for the five vowel s^s consisted of pairs of oral and 
corresponding nasal . vowels. For each set, two types of ordered sequences were 
made: ascending sequences (i.e., each oral stimulus from 1 through n paired 
with the nasal standard) and descending Sequences (i.e., oral-nasal pairs from 
n through 1). A pilot study in which listeners selected the "best-match" 
oral-nasal pair from these sequences showed that matches tended to fall' in the 
middle of the vowel set. To eliminate clustering of responses in the center 
of each vowfel set', three truncated ordered sequences for each vowel set w6re 
constructed from the full ascending and descending sequences. The truncated 
sequences contained the following oral stimuU .( paired with the correspond'ing 
nasal vowel); J.: 1 -5 (twice), 2-6; e: 1-8, 2-9, 3*10; ae: 1-7 (twice), 
2-8; a: 1-6, 2-7, 3"8; o: 1-6,^^7, 3"8. The three truncated versions of 
each of the five vowel sets werwHPranged in random order, for a total of 15 
trials. The inter-stimulus interv"?! between members of art oral-nasal pair was 
.5 s and the interval between pairs in the ordered sequences was^ 1 s; sub- 
jects controlled intervals across sequences and trials. ' 

Before testing, subjects were given a. brief description of • the kinds of 
voweJ. stimuli to be presented. Subjects were told that 'they would hear 15 
sets of vowels, each set consisting of several vowel pairs. They were in- 
formed that the first member of each pair varied across the series while the 
second member stayed the same and that these pair members were "oral vowels" 
and "nasal ^vowels," respectively. It was explained that nasal vowels usually 
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>bccur in English In the context of m or n. e.g., mom>^ver3U3 the oral vowel in • 
Bob), man (versus bad) , and ipoan (versus boat ). . ' ^ 
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Spectra of the F1 -matched stimulus 'pair (upper panel) and the cen;? 
troid-matched stimulus, pair (Ipwer panel) for the low front vowel 
set ae. 
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Subjects yere tested individually i-n a sojjnd-attenuated boath. Stimuli 
were pre3ente\l binaurally over TDH-39 earphones with an inA^eractiv^ computer 
program* At the onset, of each'of the 15 trials, the; program presented the 
ascendirfg and descenjling truncated sequences ^or that-' vowel set, ' (TJieNsTela- 
tive order of ascending and descending sequences was counter-balanced ad^^osT 
tri-a^rs. ) The subject could then request repetitions of either of the sequences 
or of individual oral-nasal^ pairs frofn the sequence. For each trial, a sut)- 
Ject/^as instructed to select that pair in which the oral vowel was the most 
simflar to the nasal standard; thi^"be3^t -match" pair was circled' on a print-^ 
ed acore srteet. ,A Subject was encouraged to listen to the Sequences and to 
indiv^idual pairs as many times as n^eeded^to feel confident about ^ the 
best-match decisi^. Average testing tim^ was approximately ^45 ,minutes. 

/; \ Resul^^y 



The histograms .-in Figure 5 show subjects' responses to the five vowel 
sets, 1^ e, «, a, and o. As there was no apparent effect of truncation, re- 
sponses to th€L three truncated versions of each vowel set were pooled. The 
data ^thereforeLfegf'esent 60 responses (20 subjects X 3 truncations) per vowel 
set/ Oral vowel stilnulus nuKiber is on the ordinate and percent best-match re- 
sponses on t,he abscissa. The F1 match in each vowel set is indicated by * and 
the centroid match by . ■ ' 

Figure 5 shows \hat subjects' beis^-match responses to each vowel set are^ 
spread over several' stimulus pairs. Of special interest here are the Fl- and 
centroid-raatched pairs. /T6\ was hypothesized that if perceived nasal vowel 
height were detewnined by cen^r of gravity, then the- perceptually most simi- 
lar oral-nasal pair In each vowel set would be the centroid-matched pair. If, 
however, perceptwal ijitegrati6n of F1 and FN did not occur; then the most sim- 
ilar pair might be expected to be the F1 -matched pair. 

As seen in Figure 5 , the F1 -matched oral-nasal pair in vowel set i^ 
accounted for over 70$^ 6f Subjects' responses. But in the regaining four 
vowel sets, .subjects perceived the F1<-matched, pair as the most similar ""pair 
only 2% to 1 251 of 'the tirriA. For each of tlTe five vowel sets, a Jt-test of the 
difference betweerWthe stiffivlus number of the F1 match and the mean stimulus 
value of each subject's responses showed that responses differed significantly 
from the Fl-matched vowel pair, j., t(19) - 2.68, p <, .05; e, t(19) =» 11.8?,, £ 
< .01; «, t(19) » t5.88, £ < .01; a, t(19) » 1^.l5, £ < .01, and o, t(19) = 
10.97, £ < .01. These findings, are corffistent with, the data of Wright Tl980), 
which showed that perceptual effects of nasalization on vowel height were not 
always a function of acoustic effects of nasalization on first fornnant fre- 
quency. 

Although listeners generally did not match oral and nasal vowels on the 
basis of first formant frequency, they also tended not to choose the cen- 
troid-matcti^d pairs as perceptually similar. In the mid and low vowel sets, 
the most frequently-chosen oral-nasal pair fell between the'FI and centroid 
pairs. ThJfs modal best-match response was closer to the centroid for mid 
front e, but closer to F1 for the low vowels a and a. However, du,e to the 
centroid akew of the », a, and o distributions, subjects' mean response (given 
in Figure 5) was closer ^p^he^centrold than to F1 for all four nor^-high vowel 
sets. A ^-teat for eacjj|^owel set cornpared the differenbe betwe^^n thi stimu- 
lus number oT^ Wie centroid match' and each subject's mean reSpons^ to the 
difference between the F1 match and mean responses. v*JH^^ analysessshowed that 
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Figure 5. Percent "t)eat-match" responses to the oral-nasal vowe^ pairs for 
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perceptually-similar pairs of oral and nasal vowels were significantly closor 
to the centroid-rnatched pair than to the Fi^-matched pair in the four non-high 
vowel ^ series, e, o, t(19) » 3*27, 3*^1, ^.S?, respectively, p < .01; and 
a, t(l9) 2.2^4, £ < .05. Only in the hi^gh vowel set i^ were listeners' re- ' 
sponses significantly closer to th^e vowel pair matched for F1 frequency, ^(19) 
° 18.15, £ < .01. ^ " ^ ^ 

Discussion , , 

The purpose of this experiment was to deterirfine whether spectral integra-- 
tion of the first oral ^nd nasal formants occurs in the perception of nasaJ 
vowels such that perceived nasal vowel heigfit^ correlates with the center of • 
the ^first region of spectral prominence rather than with the frequency of the 
first formant. The method used to elicit height judgmejits tjrom phonet i<s^al- 
ly-naive subjects required listeners to select from a continuum of oral vowels * 
the vowel that was perceptually most similar to a nasal vowel standard. Since 
the oral stimuli differed from the nasal standard only in low-frequency spec- . 
tral characteristics, the selected oral match was taken as an indication of 
the perceived height of the nasal vowel. The results suggest that perception ' 
of nasal vowel height, as measured by this paradigm, involves integration of' 
low-frequency spectral prominences. Perceived nasal vowel height was not de- 
termined solely by the first formant: with the exception of high [T], F1 
accounted for very few of the listeners' responses. Rather, listeners' re- 
sponses showed very consistent deviations from F1 : when the frequency of FN 
was less than the frequendy of F1, as in low [S] and [3], the closest oral • 
match tiad a relatively low Fl frequency; when FN ' frequency- was greater than 
F1 frequency, as in mid [S] and [3], the selected pral match had a relatively 
high Fl. In all four of these vowel sets, the selected Fl frequency of the 
oral vowel was intermediate relative to the Fl and 1^ frequencies of the nasal 
voWel. '"Even for tilBh [T] , over 8056 of the non-FI responses were pulled in' the 
' direction .of the nasal formant. Thus our data provide empirical support for 
previous speculations that the relative positions of the first oral and nasal 
formants might influence* perceived nasal vowel height (Fant, i960; Joos, 
19^8; Wright, 1980). • . 

The finding that perceivted nasal vowel height was not determined by the 
frequency of a single low-frequency spectral preak but rather involved apparent 
integration of. low-frequency spectral components demonstrates the cen- ' 
ter-of-gravity effect in the perception of nasal vowel height. The high front 
nasal vowel, however^, did not show strong evidence of perceptual integration 
of Fl and FN: the majority of listeners' responses .to [1] points toward Fl 
frequency as determining perceived height. A possible 'explanation for* this 
difference between the high and non-high .vowels lies in'^the distance between 
Fl and FN frequencies in [1] versus [S], [38], [3], and [JB] . As noted aboVe, 
Chistovich and Lublinskaya (1979) and Syrdal and Gopal ( 1 9830 , report that for- 
mant integration does not occur in oral^ vowel6 (i.'e., the center-of -gravity 
effect disappears) when formant ^distance excefeds 3*5 Bark. In our stimuli, 
the separation between Fl and FN in th^ mid and low nasal vowel^s was 2.5 and 
3.4 Bark, respectively, >/hile the ' separation for the high nasal vowel was ^ .5 
.Bark. Fl-FN integration in the mid and low, but^not the high, nasal vowels is 
therefore consistent with previous oral vowel fihdings. ^ x 
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While* the center-of -gravity effect is apparent in the mid and low vowel-' 
data, It is al$o clear that perceived na^^al vowel height did not^ correspond 
exactly with our measure' of certter of gravity, the centroid. Although ob- 
tained matches between oral and nasal vowe]^ were significantly closer to the 
centroid-matched pairsf^ than to the FWmatohed pairs, 3^% to 7^5t of the re- 
sponses to the non-high vowel sets f^ll between the F1- and yentroid-njatclted 
pairs. This bias towards Fl^in listeners' Judgments indicates that, for the 
centroid to reflect ^perceived vowel hpight, F1 sbould \jq giyen more^^ei^ht 
than in the current measure.* Note, however, that such a^ revision is sim- 
ply a matter of increasing the weight, of the loWest-f requency spectrWL peak\ 
since in low [$] >and [aj, F1 was the .second, rather than the first, ^ectrai 
prominence. Although identif^ation of the spectral prominence corresponding 
to F1 is problematic in nas&l 'vowels, this problem does not change our finding 
that subjects'- responses were higher than the ,centroid f or , low nasal vowels 
but lower than the certtroid for non-low nasal vowels^ Furthermore, the F1 
bias cannot be accounted for."^ increasing the weigt)trt)f the higher-magnitude 
spectral f^eak, since the magnitude of the secoad peak . was greater than the 
magnitude of F1 in mid [0]. It appears, then, that no simple weighting of 
spectral components in terms of their frequency and magnitude will account for 
perceived center of gravity in nasal vowels. Whether oral vowels show a simi'- 
iar discrepancy between • perceived center of gravity and the* centroid is 
currently und^ investigation. 

In summary,^ although our measure of center of gravity needs to be re- 
vised, the results clearly evidence the center-of^-gravi ty effect in the 
perception of nasal vowel height. Previous studies with oral vowels h^e 
shown that vowel formants are integrated over frequency intervals which are 
broader than a critical band (Bladon, 1983;- Chistovfch & Lublinskaya, 1979; 
Syrdal & Gopal, 1983). Our findings with the first oral ahd nasal formants of 
nasal vowels show that nasal vowel formant energy is also integrated over rel- 
atively wide frequency intervals. Whether the critical distance for formant 
averaging is the same in nasal vowels as in oral vowels needs further study. 
The data presentetl here, however, are consistent with the critical distance of 
3.5 Bark previously reported for oral vowels. 
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Footnotes 



i 



*The Bark scale divides the audible frequency range irjto units of criti- 
cal bands, where 1 Bark equals one critical band. The relationship of Hertz 
to Bark is expressed in the following equation from Schroeqer, Atal, and Hall 
(1979) , ^ 

f - 650 sinh(x/7) ""^""^ 

where f is frequency in Hz and x is- frequency in Bark. 

^Since there is a problem in identifying the first or^l versus the first 
nasal formant of nasal vowels, we might ask « whether tWe F1 match indeed 
Batches first oral formant frequencies of the oral and nasajl vowels or whether 
it might be a F1-FN match in some vowel sets. One way* t[o avoid this issue 
would be to extend the F1 range covered by each oral vow^l continuum to in- 
clude-i- the frequencies of both F1 and FN of tKe corresponding nasal vowel. 
However, a pilot study with such extended continua indicated that pairs in 
which F1 frequency of the oral yowel matched what we havfe labeled "FN" fre- 
quency of the nasal vowel were very poor perceptual matjches. Inasmuch as 
these extended series were unnecessarily long, the "FN*^ ejid of the series was 
omitted in th^ actual experiment. [ 

^For all vowel sets, matches between oral. and nasal vowels in F1 and cen- 
troid* values were based on measurements of LPC spectra^ of these vowels.*ifc^ In 
the \LPC analysis, predictor coefficients were calculated for ^each oral 
vowel and 18 for each nasal vowfel. To verify the LPC measures, F1 and cen- 
troid values were also obtained^ from FFT spectra of the vowel tokens. These 
frequencies were within^ 15 Hz of the LPG measures. 

"•For a single frequency range to be applied in each vowel set, a rather 
broad frequency range was necessitated by the variation in Fl frequency in the 
oral oontinua (the Fl frequencies of th^ endpoint stimuli in a continuum were 
up to M90 Hz apart). This broad range, however, is not meant to imply that 
pen^eivers average- spectral information over a 1000 Hz range. A mor^e accurate 
int^pretation fs that these frequencies might be relevant to perception of 
voweK height ; additional research' is of course necessary to determine the 
limits >pf the relevant frequency range. 

^Similarly, ^arlson, Fant, and GranstrOp (1975) reported that efforts to 
calculate F2' as a linearly weighted mean frequency of F2, F3, and FM were 
unsuccessful. Their revised formula gave greater weight to F^ when F2 was 
* close to F3 but greater weight to F3 and FM when F2 and F3 were far apart. 
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RELATIVE POWER OF CUES: FO SHIFT VS. VOICE TIMING^ 



Arthur S. Abramsont and Leigh Liskertt 



Background 

> 

The acoustic features that bear Information^ on the Identity of phonetic 
segments are commonly called cues to speech perception* These cues do not . 
typically have one^to^one relationships with phonetic distinctions. Indeed, 
research usually shows more than one cue to be pertinent to a distinction, al- 
though all such cues may not be equally important. Thus, if- two ques, x and 
^, are relevant for a distinction, it may turn oiit that for any value x» a 
variation of ^ will effect a\ significant s^ift in listeners' phonetic judg- 
ments, but that there will be some values of ^ for which varying x will have 
negligible effect on phonetic Judgments. We say then that is the more 
powerful cue. ^ ^ 

A good» deal of evidence now exists to show that thje timing of the valvu-- 
lar action of the larynx relative to supraglottal articulation is widely used 
in languages to distinguish homorganic consonants. The detailed properties of 
the distinctions thus produced depend on glottal shape and concomitant laryn- 
, geal impedance or stoppage pf airflow, as well as on the phonatory state of 
the vocal folds. Such acoustic consequences as the presence or absence of 
audible glottal pulsing during consonant closures or constrictions, the 
turbulence called aspiration between consonant release and onset or resumption 
of pulsing, and damping of energy in the region of the first formant., have all 
been subsumed by \s (Lisker & Abramson, 1964, 1971) under a general mechanism 
of voice timing. In utterance-initial position, the phonetic environment in 
which consonantal distinctions based on differences in the relative timing of 
laryngeal and supraglottal action have been most often studied, this phonetic 
dimension has commonly been referred to as voice onset time or VOT. 

Although the acoustic features Just mentioned, and perhaps some others, 
may be said to vary under the control of the single "mechanism" of voice tim- 
ing. It is of course possible, by means of speech synthesis, to vary them one 
at a time to learn which of them are perceptually more importa^nt. We must not 
forget, however, that such experimentation involves pitting against one anoth- 
er acoustic features that are not independently controlled by the human speak- 
er. 

9 



*Also to "appear in V. Fromkin (Ed.), Phonetic linguistics . New York: 

Academic Press. 
tAlso University of Connecticut. 
ttAlso University of Pennsylvania* 
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Abramson & Lisker: Relative Power of Cues: FO Shift vs. Voi6e Timing* 



A relevant feature not so far mentioned is the fundamental frequency (FO)' 
of the voice. If we assume a certain FO contour as shaped by the intonation 
or tone of the moment, there is a good correlation between the voicing state 
of an initial consonant and the FO height and movement at the beginning of 
that ^contour (House & Fairbanks, 1953; but see also 0.' Shaughnessy, l979, for 
complications). After a voiced stop, FO is likeiy CD*be lower and shift up- 
ward, while after a voiceless stop it will be l^igher and shtft downward 
(Lehiste i Peterson, 1961). Although the phenomenon has not been fully ex- 
plained, it is at least apparent that it is a function of physiological and 
aerodynamic factors associated with the voicing difference. 

The data derived from the acoustic analysis of natural speech can be 
matched by experiments with synthetic speech that demonstrate that FO shifts 
can influence listeners' judgments of consonant voicing (Fujimur.a, 1971 ; Hag- 
gard, Atabler, & Callow, 1970; Haggard,* Summerfield, & Roberts, 1981). Of 
further interest in this connection is the claim that phonemic tones have de- 
veloped ijh certain language families through increased awaremess of these 
voicing-induced FO shifts and their consequent promotion to distinctive pitch 
features under- independent control in production (Hombert, Ohala, & Ewan, 
1979; Maspero, 1911 ). 

Our motivation for the. present study was to put FO into proper perspec- 
tive aq one of a set of potential cues to" CQnsonant voicinft coordinated "by la- 
ryngeal timing. After all, our own earlier synthesis (Abramson ;& Lisker, 
1965; Lisker & Abramson, 1970) yle^lided quite satisfactory voicing distinc- 
' tions without FO as a variable. In addition. Haggard et al. (1970) may have 
exaggerated its importance in the perception of natural speedh by their use of 
a frequency range of 163 Hz, one very much greater than, for example, the 
range of less than ^0- Hz found for English stop productions by Hombert (1975). 
We set out to test the hypothesis that the separate perceptual effect of FO, is 
small and dependent upon voice tipiing, while the dependence of the voice tim- 
ing effect on FO is virtually nil. We used- native speakers of English as test 
subjects. 

Procedure 

Making use of the Haskins Laboratories formant synthesizer, we prepared a 
pattern appropriate to an initial labial stop followed by a vowel [a]. Vari^ 
ants of this pattern were then synthesized with VOT values of 5, 20, 35, and 
50 ms after the simulated stop release. These values were chosen because of 
earlier work (Figure 1) that determined English voicing judgments for a VOT 
continuum ranging from 150^ ms before release to 150 ms after release. This 
Range of VOT values was sampled at 10 ms intervals, except for the span from 
10 ms before release to 50 ms after release, which was sampled at 5 ms 
intervals. Those stimuli for which voice onset followed release, i.e.; to the 
right of 0 ms on the abscissa, had noise-excited upper formants during the In- 
terval between the burst at VOT « 0 and the onset of voice. In the labial da- 
ta at the top of the figure the perceptual crossover point between /b/ and /p/ 
falls Just after 20 ms of voicing lag. Thus we expected that the extreme val- 
ues of our mor^ limited range would be heard as unambiguous /b/ and /p/, given 
an unchanging FO, while the category boundary, lying somewhere between, might 
be shifted one way or the other as the FO was varied. In addition to a set of 
VOT variants having an FO fixed at 1 H Hz, we imposed onset frequencies of 98, 
108, 120, and 130 Hz, values commensurate with ranges reporjted for natural 
speech (Hombert, 1975; House & Fairbanks, 1953; Lea, 1973; Lehiste & Peter- 
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Figure 1 . 



English voicing Judgments for stops varying in vbl. Below eaqli 
pair of curves, a histogram (from Lisker & Abramson, 196^) of fre- 
quency distributions of VOX in speech. (Reproduced from i*isker & 
Abr'amson, 1970.) . . , * . • 
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son^ 1961), That is, the FO at voicing onset for each variant began dt one of 
those frequencies and shiftedi upward or downward to a level of 1p Hz where it 
stayed for the re3t of the syllable. These FO shifts were of thrree durations, 
50, 100, and 150 nis. These fit with our own cursory observations and bracket 
the* value of 10,0 ms found by Hombert* (1975)- We recorded the ^resulting 52 
stimuli — two tokens of each — in three randomizations and played the tapes to 
11 native speakers of English for labeling as /b/ or /p/. The subjects, three 
women and eight men, represented a wide variety of regional dialects, ten in 
the United States and one in Britain. 

\ , ci> Results 

The overall results are shown in Figv^re 2'. The three panels are for the 
durations of FO shift. The abscissa of eac!i~panel shows the four VOT values, 
while the ordinate gives the percentage identified as /p/ for'^ach VOT. The 
coded line standing for the variants with a flat FO of 11^ Hz is, of course, a 
plot of the S^e data in all three papels. The 50$ perceptual crosspver point 
for the flat FO falls at about 25 nip of VOT. This is consistent with the re- 
sults for the more finely graded series of stimuli in Figure 1. Indeed, for 
all conditions In Figure 2, it is VOT that is the .main causative factor, re- 
gardless of FO, with perceptual crossovers 'i n the region of the VOT of 20 ms. 
With hindsight we can say that additional stimuli with VOTs of 15 and 25 ms 
would have given more precision. At the same time, we do' note effects of the 
fundamental frequency shifts: In each panel there is much spread of data 
points for 35 ms, and none for 50 ms. 

In Figure 3 we focus on the results for the stimuli with a VOT of 20 ms, 
the one that shows the major effect of FO shifts. For each of the four FO 
onsets we see the percentage of /p/ responses. The coded lines stand for the 
three durations of FO shift. A rather general upward trend in /p/ responses 
is evident as FO onset /'isea. A two-way analysis of variance yielded a sig- 
nificant main effect for FO onset, F(3, 30) =• 36.45, £ < 0.001, and a strong 
interaction - between shift-duration*^ and FO onset for each duration, 
F(6, 60) - 6.00, £ < 0.01 . > " 

' Figure 4 focuses on the FO onset of 130 Hz, the one that had the highest 
number of /p/*^identif icatlona. Thg /p/ responses for this FO onset at all 
four VOT values are shown. Coded l^nes stand for the three shift durations; 
the flat FO plot, marked "no shift," is repeated from Figure 2. It is once 
again obvious that the major effect is at the VOX of 20 ms, with the deviation 

from "no 9hift" Increasing with greater shift duration. 

/..■■ 

The spread of points at the VOT of 5 ms in*Figure although much small- 
er than that at 20 ms, made us look for significant effects in individual 
cells of the confusion matrix underlying all our plots. That is, wherever we 
found apparent effects of fundamental frequency at VOT values other than 20, 
the locus of the main effect, we did a one-tailed t.-test for significant 
deviations from lOOjt. All such suspicious clusters o? responses were at VOT 
values- of 5 ms and 30 ms; for the former, we expected 100J6 . /b^ identifica- 
tions and for the latter, lOjPJl /p/ identifications. We found three such sig- 
nificant deviations; all^of them at the VOT of 5 ms: (1) 120 Hz onset and 50 
ms duration, t(10) ,- 2.7jp, £ < 0.01, (2) 130 Hz onset and 100 ms duration, 
t(10) - -2.51, £ < 0.025,'^3) 130 Hz onset and 150 ms' duration, t(10) » 2.799, 
£ < 0.01. No such significant deviations ^ere found at the VOT values of 35 
ms and 50 mg, . . 
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Figure 3. Effects of FO shifts on VOT of 20 ras. 
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Conclusion 

We conclude that there is .a modest effect of fundam-ental frequency shifts 
on Judgments of consonant voicing even within mor^Viatural ranges of FO 
perturS^ion* than those in Haggard et al, (1970), This is much like the re- 
sults obtained in the investigation of Thai in an attempt at determining the 
plausibility of arguments on the rise of distinctive tones (Atjramson, 1975; 
Abramson & Ertckson, 1978), 

Although they too used a more natural FO range. Haggard et al. (1981) 
used an experimental design an0 stimuli that were somewhat different from 
ours; their aims were also rather different. To the extent that their data 
and ours are comparable, they support each other. ^ 

If, for the l5ake of considering the question of relative power of acous^ 
tic cues In the perception of a phonetic distinction, we separate fundamen- 
tal-frequency shifts from the other cuea linked to the dimension of voice tim- 
ing, voice onset time is clearly the dominant cue. Only VOT values that are 
ambiguous with a flat FO are likely to be pushed into one labeling category or 
the other by FO shifts in a forced-choice test. Finally, there are values of 
VOT that are firmly categorical; they cannot be affected by FO. There are, 
however, no values of fundamental frequency that cannot be affected by voice 
onset time. 
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Footnote 

*The normal ranges of FO variation linked to consonant voicing, not only 
in citation forms but especially in running speech (Lea, 1973; 0*Shaugnessy , 
1979), have still not been well described. We hope to report soon on our cur- 
rent study of this matter with different sentence intonations^ as a variable. 
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LARYNGEAL MANAGEMENT AT UTTERANCE-INTERNAL WORD BOUNDARY IN HmERICAN ENGLISH* 
Leigh Liskert and Thqmas Baer 

* Abstract , Much attention has been given to the acoustic and physio- 
logical means by which the /bdg/-/ptk/ distinction in English is 

1 signaled* The most important articulatory difference |has been found 
to involve the nature and timing of laryngeal action associated with 
the stop articulation. F6r the labial stops /b/ and /p/, at least' 
three, and possibly four,/ phonetic classes must be recognized, bi4t 
we cannot assume that these make up the complete inventory of the 
ways in which American English speakers coordinate lip and larynx 
maneuvers in producing these phonemes. Acoustic and physiological 
data obtained from one* American English speaker who produced utter- 
ances containing /b/ and /p/ in a variety of contents showed at 
least five patterns of Hp-larynx coordination, that isT, a degree of 
phonetic versatility usually encountered in studies Comparing dif-^ *^ 
ferent speakers across different languages, - { 

: . Introduction 

For many years a good deal pf attention has been given to the acoustic 
and physiological aspects of phonetic distinctions represented by such English 
word pairs as PILL-BILL, RAPID-RABID, and RIP-RIB, Although the phonetic 
differences are not precisely the same from pair to pair, we can suppose that 
they largely reflect differences in the nature and timing of laryngeal adjust- 
mentq made in association with the closing and opening of the lips, A common 
effect of these differences is that the first word of each pair is manifest-ed 
as an acoustic event having a shprter interval of voicing th^n the secjpnd. 
Since standartl phonological aijalysls and orthography ascribe* this voicing 
difference to one between a phoneme /p/ and a phoneme /b/, it is these pho- 
^nemes that are characterized as voiceless and voiced, respectively. But while 
it is enough to posiL Just tw§ such phonemes in order to provide distinct 
phonemic spellings of al|l phonetically different items in the English lexicon 
that have labial stops, at least three, and possibly Tour, types of labial 
stop are generally identified: the phoneme /b/ includes a type with voiced 
closure and one with voiceless closure, and /p/ has both an aspirated and 
unaspirated variety of voiceless stops (Glmson, 1962; Trager & Smith, 1951). 
Moreover, T^hese three or four types may not make up a complete inventory of 
the » ways In which English speakers coordinate laryngeal and Supraglottal 
manuevers when producing utterances that include labial stops; they are at 
best adequate only for virtually all one--worxl utterances of the language. 




*Al3o Language^4tnd ' Speech , in press. This paper was presented at the loih 
International Congress of Phonetic, Sciences, 1--6 August 1983t Utrecht. 
tAlao University of Pennsylvania. 
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^ Lisker & Baer: Laryngeal Management 

As commonly formulated, the rules that relate phonemic spellings and 
pronunciation are applicable to single phonoloaical elements, and they have as 
their domain these entities in certain specified contexts within single wordSv 
Thus the phonologist 's account of the English labial stops is a set of 
Instructions for pronouncing the letters /p/ and /b/ in h*D spellings of * En- 
glish words. But these rules do not prov>J.de clear guidance to the pronuncia- 
tion of /p/ and /b/ in every context in which they are used. In particular, 
they are silent(^about lip-larynx management in the case of utterances for 
which the output of lip and larynx actions is represented by two letters* rath- 
er than one. Consequently the nature of the difference between the events 
represented as /b/ /h/ in ABHOR and 6UB HERE and /p/ in APPEAR is not infer- 
able from most accounts of English phonology. Nor can we determine from this 
literature whether lip-larynx behavior for the forms APPEAR, ^ UPHOLD and STOP 
HERE are essentially identical or significantly different. If we do not 
uncritically accept the phonologist *s narrow view of phonetic specification as 
"X^.^ rules for the performance of the letters of hAovepresentatlon, i.e., if we 
^decline to believe that a phonological spelling cum derivation rules is 
necessarily the same as a phonetic description, then we may find that the En- 
.glish speakers display a range of systematic variation in lip-larynx coordina- 
tion considerably greater than is implied by commonly accepted descrip-tions of 
the English stop consonants. It may turn out, upon an examination of the kind 
we describe below, that there is a physical basis, in addition to the 
well-recognized phonological one, for considering lip and larynx activity in 
ABHOR, RUB HERE, UPHOLD, and STOP HERE to be gestures for two phonemes in se- 
quence, while in APPEAR those gestures are associated with a single element. 

Procedure 

In order to gather data giving a more complete picture of lip-larynx re- 
lations we made up a list of suitaple sentences, as follows: 



1. Let's tape each piece afeparately. 

2. Let's play pinochle. 

3. Let's Just tape hit pieces. 
k. Let Abe hit it hard. 

5. Did Deb hear what he said? 

^. A^flip-pistol figured in the heist. 

7. Who is Jeb Hill? 

8. I don't play billiards. 

9. I couldn't help hearing that. 

10. I ican't tell Pete anything. 

11. I'xhink there's a drip here. 
^2\ This is called a drip-pit. 
13. Don't tVip Bill up. 

1^. Don't keep pills in your desk. 

15. Don't keep billl^ in your desk. 

16. Don't keep hymn books in your desk. 

17. Why keep earrings like these? 

Id. Why keep hearing the same old songs? 

19. Why keep peering at your watch? 

20. Why keep beer cans in the sink? 

21 . ^Is this place light-tight? 
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2^. I3 thl3 the ri^ht iieight? . 

23. Is this side higher? 

.24. There's a mint here for somebody. 

25. Let's have some mint te^ with dinner. / 

26. Let Herb pay the bill* 

27. A glib political essay is easy. 

28. ' WeUl keep busy till April. 

29. They pay" plenty for lip service. 

30. There' 9 some tape print-through. 
31* Let's stay put for a bit. 

32. Shooting clay pigeons is great fun. 

33. Wheh did the trib hit the street? 

One of us, a Speaker of Greater New York City English, read it aloud ten 
times as the following information was recorded: acoustic waveforms, glotttal 
aperture as per transillumination (TI), intraoral air pressure, anterior con- 
tact, and electromyographic signals from the "interarytenoid (INT) and posteri- 
or cricoarytenoid (PCA) muscles. We attempted, but failed, to obtain 
satisfactory signals from the lateral cricoarytenoid. The recorded signals 
were computer averaged after the ten tokens of each sentence were alignecT" a 
the releases of the stops being examined. No other normalizations were im- 
posed. ' \ 

Results • - 



Figure 1 shows average curves for 500 ms segments excerpted f rom \ recordr 
ings of the sentences I DON'T PLAY BILLIARDS and LET'S PLAY PINOChLe* The 
vertical lines at the midpoints in each panel mark the onsets of the >^ release 
bursts of the /hi and /p/ of the words BILLIARDS and PINOCHLE. The curves 
are, for the most part, just what we should expect: the solid ones for /b/ 
indicate no change, in INT or PCA activity accompanying lip contact^ nor is 
there any sign of glottal opening. The dotted /p/ curves show INT relaxation, 
PCA contraction, iand an opening and closing of the glottis. There are the 
expected differences in air pressure profiles for /b/ and /p/, as well as 
differences in the durations of voicelessness or aspiration indicated by the 
audio envelope curves. More noteworthy is the close similarity of the articu- 
latory contact patterns, which indicates that there is no difference in clo- 
sure durations. | 

Figure 2 shows averaged data for three sentences (#'17 •18,19), the rele- 
vant phrases being KEEP EARRINGS, KEEP PEERING, and KEEP HEARING. The 
word-final /p/ before the vowel in KEEP EARRINGS was produced with no apparent 
glottal opening during the interval of labial contact and elevated air pres- 
sure, although there was INT slackening and some PCA contraction. (For some 
tokens of this sentence, the word-initia]p vowel was glottalized at onset.) The 
picture for KEEP PEERING is very like the one for the /p/ of PINOCHLE shown in 
Figure 1. The similarity amounts to identity in the transillumination pro- 
files,' although the PCA and pressure signals are high for a longer time in 
KEEP PEERING. Note that although INT and PCA adjustments begin in time with 
the onset of the long closure of KEEP PEERING, the peak of glottal opening is 
as closely synchronized with the release as in the case of the simple aspirat- 
ed /p/ of PINOCHLE. 
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Activity asaodiated with the production of /)llablal stops In the 
sentences LET'S PLAY PINOCHLE and ^ DON*T PLAY BILLIARDS. 
Electromyographic, transillumination, artlculatory contact, « in- 
troral air pressure, and audio data are shown, ^ach curve is an 
jjeajk^njble^ average calculated from ten tokens of eacn sentence. Or- 
dinate scales have been omitted to simplify the figure. Zero time 
represents oral release for the underlined ^tops. 
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Figure 2^' Averaged data for word-^final /p/ in three different sentence 
environments. „ 
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In KEEP HEARING (the dfttted curves of Figure 2) *the beginning and peak of 
glottal opening are abou^piO ms later than in KEEP PEERING (the dashed 
curves), and this is presumably to be c6nnected with the difference in the au- 
dio signal profiles, which suggests that voicing respies later in KEEP HEAR- 
ING, This greater lag in the resumption of voicing is clearly not to be ex- 
plained by a gf^ter glottal opening at the time of release, or by a greater 
magnitude of tT^ak opening. 

The finding that the word-initial as^)irated /p/ is released when. glottal 
aperture is maximum, as in PINOCHLE (Figure 1) and KEEP PEERING (Figure 2) 
turns out, upon further examination of our data, not to hold generally for 
this stop type. In Figure 3 the solid curves represent transillumination data 
for the sentences LET^S PLAY PINOCHLE, THIS IS CALLED A DRIP-PIT, and I DON'T 
PL'AY BILLIARDS. The aspirated /p/s following LET'S, DRIP, and DON'T were all 
released after the poJLnt of maximum glottal aperture was past. The glottal 
aperture of LET'S PLAY is no doubt as much associated with the /s/ as with the 
/p/, which may explain why it is early relative to the release. Perhaps in 
all three sentences there is something about their prosodies that is a factor 
in advancing the time of glottal opening and closing, but it is nevertheless 
puzzling that the /p/ of LET'S PLAY is well aspirated though the glottis at 
release is already two-thirds the way to closure. This result is especially 
puzzling -in the light of other published data on /s-k/ sequences (Yoshioka, 
LOfqvist, & Hirose, 1981) and /s-t/ sequences (Pfetursson, 1978) with interven- 
ing w<^rd bounclaries that show a second peak of glpttal opening centered at the 
release of the stops. 

When we compare sentence^, said to involve /b/+/h/ and /p/Vh/ sequences, 
as per Figure ^4, we find little difference in contact* patterps, in 
transillumination profiles, or in the time at which the audio signals return 
to full amplitude after the stop releases. The only difference in glottal 
aperture patterns is the voicing ripple for the sequence with /^b/ in contrast 
to the smooth curve for /p/; the temporal courses and magnitudes of opening 
are precisely the same. The INT and PCA patterns are also very much alike for 
the two sequences. We note, of course, the expfected differences in oral pres- 
sure, o 

Summary ^ 

Our data appear to bear out the truth of the supposition motivating the 
experiment Just reported — namely, that a description or* lip-larynx coordina- 
tion patterns limited to the /p/-/b/ contrast in such word pairs as PILL-BILL, 
RAPID-RABID, and RIP-RIB fails 'to account for all the patterns to be found in 
English. In all, at least so far, as many as five may be enumerated: 1) 
Intervocalic /b/ is produced with no change in the settings of the INT and PCA 
muscle?ii or in the glottal aperture appropriate to the neighboring vowels. 2) 
The unaspirated /p/ in intervocalic poiaition is accomplished with no discerni- 
ble opening of the glottis, although there is some PCA contraction and INT 
relaxation. S) Sequences of word-final voiceless obstruent and aspirated /p/ 
are produced with the PCA and INT adjustments that serve to open the glottis, 
the peak of this opening bding variable and ranging from as early as ICQ ms to 
just slightly before release. ^) An aspirated /p/ following a vowel, but not 
In word-final position, is produced with a glottal opening that peaks in close 
synchrony with the stop release, 5), Signal intervals Interpreted as a labial 
stop followed by the phQjii^e /h/ qhow glottal openings that peak well after 
the release (V0T*+50 ms), with the salient difference between /b/^/h/ and 
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/p/^/h/ a matter of voicing over the combined Intervals of oral closure and 
glottal opening, despite the absence of any observable difference in INT and 
PCA behavior, ^ . 

. Concluding Comment ' 

the observed differences in glottal aperture profile? in l^elation' to 
supraglottar events" can^ ehtirely understood on the basis of our EMG da-- 

ta, a fact that is not surprising in view of the limitations of this study. 
It is generally agreed that while the PCA may b« the only abductory muscle, 
the lateral cricoarytenoid (LCA) and thyroarytenoid (.TA) muscles ^ well as 
the INT muscle play a role vo^l fold adduction, and hen*e in determining 
the extent to which PCA contraction is effective in opening the glottis 
(SawaShima & Hiroge, 1983). We may therefore reasonably supposfe that, had we 
managed to tap one or more additional muscles of the larynx, we would be bet- 
ter able to explain the apparent anomallj?s in the data on the ^ype*2, 3, and 5 
patterns. Thus we might account for the finding that the unaspirated /p/ is 
produced without glottal opening although INT and PCA signals favor it. This 
finding is in agreement with Dixit 's (1975)- description of the Hindi voiceless 
unaspirated stops, and at variance with the results reported by Benguerel and 
Bhatia (1980). English speakers ghpw considerable variability in the frequen- 
cy and degree to which such stops are "glottalized" (as judged auditorily) and 
accompanied by separation of the arytenoid cartilages (Sawashima, 1970), and 
it Is possible that Hindi speakers are as free with this feature As English 
speakers. EMG d^ta^ reported both by Hirose, Lisker, and Abramson' fil 977 ) and 
Dixit (1975) indicate that data on the .LCA and TA*muscles would resolve the 
apparently contradictory findings* Such information^ in addition, ; would 
possibly tell us how the voicing difference between /p/^/h/ and /b/^/h/ (Fig- 
ure ^) is managed without any apparent difference in PCA and INT activity. 

As was said earlier, the greater duration of aspiration for /p/+/h/ than 
for aspirated /p/ cannot be explained,* as per Kim (1970), by a greater magni- 
tude of glottal aperture at release, but rather by the longer delay of the la- 
ryngeal gesture relative to the labial release. At release the aspirated /p/ 
has the greater aperture, but the glottis begins to close at that time;' the 
glottis is less open at the release of /p/ before /h/, but it is still 
increasing in aperture. This may explain not only the difference in the dura- 
tion of aspiration, but also our auditory impression', one consistent with a 
difference in their waveforms, that the release burst and the aspiration for 
/p/+/h/ are both of weaker intensity. 

Finally, it may be of some phonological interest that the degree of over- 
lap in lip'-larynx activity is greater for the voiceless stop plus aspiration 
that is interpreted as a single element than for those represented phonologi- 
cally as /p/+/h/ and /b/+/h/. It ip tempting to infer from this that the 
phonologist •s decision as to whether one or two elements are involved is 
phonetically based, but^ a comparison of our data with those reported for Hindi 
by Dixit and by Benguerel and Bhatia forces us to recognize that the decision 
is primarily dictated by morphosyntactic considerations. It is true that En- 
glish /p/+/h/ and Hindi /ph/, which may well be produced with equal delays in 
voice onset, differ In that peak glottal opening is later for the English 
two-phoneme sequence; English /b/^/h/ and Hindi /bh/, however, show no Simi- 
lar difference tp/Justify a claim that their different phonological status de- 
rives from- a pnonetic difference. The basis for denying that English 
possesses voiced aspirated stops and voiceless stops of two degrees of aspira- 
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tion i3 not phonetic at all. At the same time it can be said that phonetic 
data of the kind presented above provide ancillary support for the phonologi- 
cal distinction made between aspiration as one of the features of /p/ and as 
an Independent phonoi^ogical element /h/ that freely occurs after a large num- 
ber of other elements, including /p/ and /b/. 

ft 
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CLOSURE DURATION AND RELEASE BURST AMPLITUDE CUES TO STOP CONSONANT MANNER AND 
PLACE OF ARTICULATION* vl 



Bruno Repp 



Abstract. The perception of stop consonants was studied in a con-- 
stant neutral [s-l] context • Truncated natural [p], [t], and [k] 
release bursts at two intensities were preceded by variable silent - 
closure intervals. The bursts, though spectrally distinct, conveyed 
little specific place information but contributed to the percejption 
.of stop manner by r^piducing the amount of silence required to per- 
ceive a" stop (relative to a buratless stimulus), &nrst amplitude, 
was a cue for both stop manner and place; higher amplitudes favored 
t^, lower amplitudes favored £ responses. The silent closure inter- 
val, a major stop manner cue, emerged as the primary place cue in 
this situation: Short intervals led to t^ long ones to £ response?. 
All these perceptual effects probabljT reflect listeners tacit 
knowledge of systematic acoustic differences in natural speech. 

Silent cl^ur® duration is an important cue to the perception of ' stop 
consonant manner — that is, of phonetic distinctions that rest on the perceived 
presence versus absence of a ^top consonant (e.g.. Bailey & Sumraerfield, 1980; 
Dormant Raphael, 4 Liberman, 1979; Repp, 198^). The questipn of principal 
interest in the present -study was whether different amounts of closure silence 
are needed to perceive stop consonants having different places of articula-^ 
tion. Specifically, It was hypothesized that, because labial stops generally 
have, longer closure durations than alveolar and velar stops in natural speech 
(e.g.. Bailey & Summerfield, 1980; Menon, Jensen, & Dew, ,1969; Stathopoulos 
& Weismer, 1983; Suen & Beddoes, 197^)» longer intervals might be needed for 
their perception, too. 

This hypothesis makes two semi -independent predictions: (1) Given 
unambiguous Cues to stop consonant place of articulation, more silence will be 
needed to percfeive £ than t or^ Ik; that is, perception of stop manner, as cued 
by closure duration, may depend on perceived place of articulation. (2) Given 
ambiguous place cues and sufficient silence to perceive a stop consonant, 
short closure silences will lead to t or k responses while long silences will 
lead to £ respojQses; that' is, closure duration is a dir^ect cue to place of 
articulation. The first of these jiredictions is difficult to test because the 
difi^erent acoustic configurations heeded to specify place of articulation 
unambiguously may have psychoacoustic effects on perception of the closure 

' ■ • ^ ' ^ '■ 
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silence, which are difficult to dissociate from phonetic effects due to per- 
ceived place of articulation. The second prediction, howeve^, (fan be tested 
easily by varying silence duration In a constant acoustic environment. 

In a previous study' addressing these issues. Bailey and Summerfield 
(1980) used synthetic Speech stimuli consisting of an initial [s] noise fol- 
lowed by a .Variably, silent interval and by a vocalic portion with or without 
initial formant transiiions. Two findings are relevant here> When either the 
seoond formant of a steady-state vowel or the vocalic formant transitions were 
-varied so as to cue the perception of £, Jt, or k unambiguously, the^ amount of 
silence required to perceive the stop consonant did not vary significantly 
withVto place of articulation, except that it was reduced for k cued by formant 
transitions. Bailey and Summerfield attributed this latter effect to auditory 
energy summation caused by the proximity of the second anU third formants at 
Yowel onset; that, is, they assumed a psychoacousti'o rather than phonetic 
basis for the effect. The other finding was that, when the plaoe-of-articula- 
tion cues in the v9calic portion were ambiguous, so that (given sufficient 
silence) the same acoustiVj pattern elj.cite(f more than ooe type of stop re- 
sponse,* p responses were clearly preferred at longer closure durations, while 
t or- k responses predominated at short closures. The first, negative finding 
suggests that stop manner" perception is largely independent of^perceived place 
of articulation. Tfhe second finding, however, suggests that' the listeners* 
internal perceptual criteria for place of articulation do include closure 
duration as an important acoustic- dimension. 

The principal aim of the present *tudy w^s to replicate Bailey and 
Summerfield's findings, using natural-speech stimuli that, instead of variable 
formant frequencies oc transitions. Included relef^se bursts appropriate for 
each place of articulation. 'A second aim vas to examine the specific 
contribution of the , release burst itself to stop manner perception. As a 
rule, alveolar and velar stops following [s], in contrast to labial stops, .do 
not need any closure silence to be perceived as long as an intact natural re- 
lease burst"" is present (Repp, 198^*). This 'dif/erence in silence reqairements 
m^ght be due to the higher amplitude and longer duration of alveolar and velar 
bursts (Zue, 1976), and it might disappear when the overall amplitudes of 
these bursts are reduced to resemble those of labial bursts. In actdition to 
examining this, question, the present experiment also investigated to what ex- 
tent burst amplitude affects perception of stop manner ami place, following 
Ohde and Stevens (1983) and Repp (198^4). 

Two methodological decisions require justification. First, to exclude 
cues to stop place of articulation in the signal portions surrounding the 
critical cues of closure duration and release burst, these cues were embedded 
in a constant [s-1] context. Preliminary observations suggested that [1] 
resonances contain only weak (if any) fqrmant transition cues to preceding 
stop consonants, so thid segment seemed ideally suited for the purpose. How- 
ever, this resulted in some consonant clusters ([stl] and [ski]) that are un-- 
familiar to English speakers and listeners. It was assumed, . however, that 
these clusters would not be difficult to produce or perceive, and the results 
tend to Justify this assumption. Second, in order to make closure duration a 
salient cue to stop manner at all three places of articulation, it was neces- 
sary to reduce the natural release bursts, since full alveolar and velar re- 
lease bursts arQ generally sufficient cues for perception of a stop consonant. 
This was done by waveform truncation and resulted in residual bursts that were 
•spectrally distinct but, as it turned out, conveyed surprisingly little place 



Repp: Closure Duration arid Release Burst Amplitude Cues 




information. The present study thus prima/^i^ addresses the question of the 
role of closure duration as a cue when other place-of --articulation cues are 
highly ambiguous. 



A number of repetitions of the utterances slat, splat , stlat. and sclat 
were recorded by a male speaker of American Eng lowrp^ss filtered at 4.8 

kHz, and 'digitized at 10 kHz. " One good token of each utterance was selected 
and manipulated further by computer waveform editing procedures. The release 
bursts (i.e., the aperiodic signal portion preceding the first glottal pulse) 
of splat , stlat , and sclat (originally 17, ^3, and 43 ms in duration, 
respectively) were excerpted and trimmed to 10 ms duration. This was done 'by 
eliminating the final low-amplitude portions of the labial and alveolar 
bursts. The velar burst, on the other hand,<l!Bd several amplitude peaks, the 
last and most pronounced of which^ happened to occupy - the last 10 ms; there- 
fore, this final portion was taken as the truncated burst. Two versions' of 
each truncated burst were created by changing their amplitudes by 10 dB: The 
labial burst was amplified by that amount while the alveolar and velar bursts 
were attenuated^ This was done because the labial burst had less high-fre- 
quency energy than the other two bursts (see below). Each of these six bursts 
was spliced onto^ the lat portion (365 ms long) derived from slat ; thus, the 
voiced; portion immediately following each burst was constant and contained no 
distinctive cues to place of stop articulation. A seventh, burstless stimulus 
was included as a baseline. All seven stimuli were preceded by a constant 
[3]-n6lse (226 ms long) derived from slat , ^nd by a variable closure interval. 
Closure intervals were varied from 0 to 100 ms in 20-ms Steps, for a total of 
35 stimuli that were recorded in 5 different random orders. 

Subjects and Procedure 

Ten subjects (nine paid student volunteers and the author) listened to 
the stimulus tapd over TDH-39 earphones at a comfortable intensity (approxi- 
mately 76 dB SPL for vowel peaks) and identified the stimuli in writing as 
beginning with si, spl , stl, or scl. Instructions alerted subjects to the un- 
familiar consonant clusters. 



Figure 1 compares the labeling function (percent stop responses, regard- 
less of p^ace of articulation, as a function of closure duration) for burst- 
less stimuli with the average labeling function for the si){ types of stimuli 
with burstsiw As indicated in the figure by the horizontal bar at the 50-per- 
cent point, \the average phonetic boundaries for* the ^six burst conditions v^- 
ied over a li^-^ms range, from 3^*5 to ^4.5 ms of closure silence. The boundary 
for the burst^^less stimuli was clearly longer — at a nominal 50.5 ms of silence 
(i.e., measuri^d to the onset of the nortexisting burst), or at an actual 6q.5 
ms of silence \(as indicated by the arrows in the figure).^ This difference was 
exhibited by all subjects ^nd was significant in a one-way analysis of Vari- 
ance on the total percentage of stop responses, after applying a^oorrection 
for the conversion to nominal closure duration and after omitting the data for 
the author who showed the largest difference, F(1 ,8) =* 16.6, < .01. Thus,^ 
the truncated release bursts made a significant contribution to stop manner 
perceptiort (cf. Repp, 19^4); that is, the l?oundary was shortened by more than 



Method 



Stimuli 



Results and Discussion 



139 




Repp: Closure'Duration and Release Burst Amplitude Cues 



IOOt 



CO 

111 

(0 80 
z 
o 

a. 

lii 60 

Q. 

O 

H 40 + 
(0 



§20 

QC 



SI 



-f- 



wlth 
burst 




without 
burst 



■t- 



0 20 40 60 80 100 
CLOSURE DURATION (ms) 

Figure 1. Effect of present^ versus absence of release burst. The solid 
function is the average of all six burst, conditions; the horizon- 
tal bar indicates the range of 50-percent cross-overs. Closure 
durations in the no-burst condition are nominal; actual durations 
are indicated by arrows. 



to 

> 
GC 
< 
O 
Z 

O 

GQ 

CC 

tu 

Z 

z 
< 

O 
CO 



46-r 



[p] 



BURST: 



[k] 



40" 



36- 

I 



+ 10 



-I- 



+ 



4- 



0 0 -10 0 
AMPLITUDE (dB) 



-10 



Figure 2» Comparison of category boundaries in six burst conditions: Effects 
of burst category and amplitude. 



uqL44 



r 

Repp: Closure Duration and Release Burst Amplitude Cues 



} 



the 10 ms expected if the presence of a burst merely had prolonged the effect- 
ive closure duration. 

Figure 2 shows the effects of burst category (intended place of articula- 
tion) ahd arriplitude on the stop manner boundary, still combining all kinds of 
stop responi&es^ Burst amplitude clearly had an effect: Amplification of the 
labial burst increased, stop responses (i.e., shortened * the boundary) while 
attenuation of the alveblar and velar bursts decreased stop responses. Thus, 
burst amplitude could be traded against closure silence In stop manner percept- 
tion (cf. Repp, 198^). The effect of burst amplitude was significant in an 
analysis of variance, F(1^9) - 8.i«, £ < .05. The main effect of burst cate-- 
gory was nonsignificant, and so was the interaction. 

A comparison across the three burst categories is difffcult because am- 
plitude differences are confounded with, spectral differences. Overall rms am- 
plitudes were determined after redigitizing the stimuli without preemphasis. 
Uriexpectedly, the amplitude of the labial burst turned out to be 3 dB higher 
than that of the alveolar and velar bursts, which were equal and 6 dB below 
the amplitude of the [1] onset (the first 10 ms). This was apparently due to • 
a gJtrong low-frequency component in the labial burst waveform. It is likely, 
however, that the amplitude of higher-frequency components is more important 
for stop manner perception, as has also been hypothesized by Ohde and Stevens 
(1983) with regard to place of articulation perception. Figure 3 compares the 
spectra of the three truncated bursts at their original amplitudes. As. 
expected, the labial burst had less energy than the alveolar and velar bursts 
in the high-frequency regions above 2 kHz; the average difference is about 10 
dB. Thus, ^plification of the labial burst by 10 dB resulted in approximate- 
ly equal levels of high-frequency energy across *the three burst categories, / 
which is consistent with the very similar stop manner boundaries bbtained (see y/ 
Figure 2). / ^ / 

So far, stop responses have been treated as a single category. We turn 
now to an analysis of stop responses, by place of articulation. Figure ^ shows 
conditional percentages of }i (i.e., scl ) responses in separate 

panels as a function of closure duration ( from ^0 ms up) arid of bursy cate- 
gory, combining the two burst amplitudes. The - no-burst conditionals also 
plotted at the actual . closure durations. It is evident that closure duration 
provided the most important cue to stpp place of articulation: At Short , clo- 
sures, -^L_r_espon3e3 predominated (notwithstanding -a possible bias against 
reporting^ stl clusters) while, at long closure durations, the response was 
overwhelmingly £. These trends held almost regardless of the nature of the' 
burst; [p] and [t] bursts, in particular, yielded highly * similar results. 
The results for [k] bursts resembled those for burstless/stimuli , perhaps be- 
cause this late component of the burst did not prteserve specific place of ar- 
ticulation information. (Cf. the absence of a pronounced mid-*f requency peak 
in the spectrum — see Figure 3 — which character i;zes velar onset spectra, 
according to Stevens and alumstein, 1978.) 

' ' The similar perceptual results for [pj and [t] bursts, whose spfectra did 
exhibit the general spectral properties Characteristic of these places ilf ar- 
ticulation (see Figure 3 and Stevens ^nd Blumstein, 1978; note that the pre- 
sent speatra are not pre^emphasized) , may have been due tcJ their short dura- * 
tion. According to Stevens and Blumstein (1978), tJ>e hiost salient place cue 
is the onset spectrum computed over a window approximately 25 ms long; if so, 
the 10-ms bursts w^re presumably integrated with the constant Jtl] onset 



141 



145 



/ 



Repp: Closure Duration arfd Release Burst Amplitude Cues 




0 1 2 3 4 5 

FREQUENCY (kHz) 

• 

Figure 3. Spectra of the lO-ms [p], [t], and [k] bursts at their original am- 
plitudes, without pre-eraphasis. Spectra were obtained by FFT anal- 
ysis (program FDI of the ILS package), using a 25.6-ms Hamming win- 
dow whose left edge preceded the burst onset by 10 ms. The spectra 
were smoothed by averaging across a ilOO-Hz rectangular window mov- 
ing in 20-Hz steps. 
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following them and thus lost much of their distinctiveness. The present re- 
sults show very clearly, ^however, that more than the onset spectrum is in- 
volved in place perception: When spectral cues are ambiguous, closure dura- 
tion takes over as the salient place cue, as also observed by Bailey and 
Summerfield (1980). 

The reason for the effectiveness of the closure duration cue presumably 
lies in the well-known fact that [p] closures tend to be longer in natural 
speech than [t] and [k] closures (although little is known about [stl] and 
[ski] clusters) . 1 An alternative, psychoacoustic explanation might be pro- 
posed, however: that the preceding [s]-noise, with its strong high-frequency 
components, left a trace in sensory memory that was integrated with the onset 
spectrum following a short closure. ' Such integration might explain the bias 
toward t responses at short closures, assuming that the predominating response 
after removal of the preceding [s]-noise would be £ (or, rathjer, b) . Even 
though research on adaptation in the auditory nerve (e.g., Delgutte & Kiang, 
198^) predicts spectral contrast rather than integration, a brief additional 
test was conducted to address this question. Ten randomized repetitions of 
the seven stimuli (six with bursts and one without) without the initial 
[s]-noi3e were presented for idwitif ication as lat^ blat , dlat, or glat to a 
new group of >nine subjects plus* the author. The r^|fults were mixed. Two sub- 
jects responded randomly. Four subjects identified the burstless stimulus as 
lat but labeled all* others predominantly blat. The remaining four subjects 
(including the author) distributed their responses more evenly, although accu- 
racy was poor {H5 percent correct for stimuli wit,h bursts; 100 percent for 
lat ); 'These results show, first, that the relative ineffectiveness of. the 
bursts as place cues in the present experiment was not due to the preceding 
[s]-nbise and closure. Second, although some subjects showed a strong bias 
toward b responses, this bias was not so universal as to lend convincing sup- 
port to the hypothesis that the striking change from t. to £ responses with 
increasing^ closure duration in the main experiment was due to spectral 
integration. More likely, the effect of closure duration lias a phonetic ori- 
gin. That is, listeners expect labial stops to have longer closures on the 
basis of their knowledge of natural speech patterns. 

Finally, Figure 5 provides a different breakdown of tHe data, which 
reveals effects of burst amplitude on perceived place of articulation. The 
conditional percentage of responses in each stop category , averaged over clo- 
sure durations from ^0 to 100 ms, is shown f or "^each of the six burst cate- 
gory /amp 3#itude conditions. '•Correct" responses (i.e., responses reflecting^ 
the place of articulation that . the -burst was intended for) are indicated by 
the cross-hatched bars. It is evident that' correct responses in each stop 
category decreased as burst amplitude was modifiedf, due to a higher percentage 
of £ responses for weak bursts and of^^^tNand/or k responses for strong bursts. 
This result replicates ' earlier Tidings of Ohde and Stevens (1983) with 
synthetic speech. Despite the - nel!3bive weakness of the present bursts as 
specific place cues, it appears thai burst amplitude contributed to place as 
well as manner perception. . ^ / 



olusiona 



The present findings are consistent with many other results suggesting 
that listeners possess detailed tacilt knowJLedge of the acoustic correlates of 
phonetic categories (see Repp, 1982, for a review). The perceptual criteria 
derived from this knowledge apparently specify that labial stops ought to have 
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[p] burst [t] burst [k] burst 

Figure *5. Response distributions in the six burst conditions, averaged over 
closure durations. 



a longer closure interval than alveolar or velar stops. They also specify 
that labial stops ought to have weaker release bursts; hence the effect of 
burst amplitude on place of articulation perception. These perceptual criter^ 
ia presumably derive from experience with natural speech in its acoiptic and 
articulatory manifestations, and they provide the frame of reference within 
which speech perception takes place. 
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/ , -Footnote 

^The speaker of the utterances for this experiment, an experienced lin- 
guist, produced five tokens eacti of splat , stlat , and sclat. Average closure 
durations were .111, 112, and 68 ms, respectively, revealing unusually long 
values for [t]. For sgat^ stat , scat , and for spraX ,^ strat , scrat , produced 
by the same speaker, however, closure durations ranked TpD^^^ Tk] > [t]. 
Clearly, more data are needed to determine whether [-1^ context is an excep- 
tion to the rule that labial closures are longest in duration. 
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EFFECTS OF TEMPORAL STIMULUS PROPERTIES ON PERCEPTION OF THE [3lJ-[3pl] 
DISTINCTION* . 



Bruno H. Repp 



Abstract e Two stud ies invest i^ate^ the lt)f Xuence of the 

independently varied durations of preceding and following signal 
portions on the amount of closure silence needed to peVoeive op lash 
rather than slash. Increases (-or decreases) in the durations of the 
[s] and [1] acoustic segments had opposite effects that cancelled i 
when the silent intervals were short (Exp, 1), but yielded a net ef-- 
fect due to [s] duration , when the silent intervals were , long. 
(Exp. 2). JhesQ findings ^n^hich resolve a conflict between earlier 
results in the literature, are interpreted as reflecting a perceptu- 
al compensation for coArticulatory shortening of [s] beforq stop 
consonants, in conjunction with , (possibl^y psychoacoustic) 
contrastive interactions bip€ween the perceived durations of adjacent 
acoustic segments. The rfesults suggest that local temporal signal 
properties, as distinct fnom global perceived speaking rate, are an 
importajit factor in phor\eti^ perception* n 

An important perceptual cue for the distinction between the word4initial 
clusters [si]' and [spl] is the absence versus presence of a silent interval 
following tha [s] noise (e.g., Bastiajo, Eimas, & Libermah, 1961; Fitch, 
Halwes, EriclBOn, & Liberman, 1980). ' Two fairly recent studies have 
investigated whether the icategory boundary on a continuum r^anging from slit to 
split , created by varying lithe duration of thi9 silent closure interval, is 
iaffected. by reductions in total stimulus durations Marcus ' (1978) ./ound that 
temporal compression left the slit-split bpund^iry unaffected, whereas Summer- 
field, Bailey, Seton, and Dorman .(19o1) fo\ind\that less silence was needed to 
perceive split in temporally compressed stRim^. Both studies made use. of 
modified natural-speech tokens of slit ; Summerfield et ^1. ^Iso us6d synthet- 
ic stimuli, with .similar results. Jn an attempt tp explain the difference in 
outcomes, Summerfield et al. pointed out that the category boundaries in the 
Marcus study were at considerably shorter silences (less than 30 ms) than the 
boundaries in, their o^n study (around 60 ms)-. They conjectured (as had 
Marcus) that a perceptual limit, perhaps related to an articulatory limit, may 
be encountered at short silences, and that this may be the reason why the 
boundary refused to shift to even* shorter values in the Marcus study. They 
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^rpreted their own findings as reflecting a perceptual adjustment to varia- 
tions in contextual speech rate. 

The principal reason for conducting tha present experiments was the au- 
thor's suspicion that temporal changes irt signal portions preceding . and 
following the silence may not be equally relevant • Several earlier perception 
studies in which clojiure , silence duration was the dependent variable, albeit 
for different phonetic contrasts, have found that the' duration of the preced- 
ing acoustic segment has a much stronger effect than that of the foll|DWing 
segment (Port & Dalby, 1982; Repp, 1979). In addition, there'is another rea- 
son to expect [s] duration to be important, quite reigardless of perceived 
speaking rate: Fricative noise duration tends to be shorter in [spl] than in 
[si] clusters (Morse, Eilers, & Gavin, 1982; Schwartz, 1969, 1970; see also 
Haggard, 1973), and listeners may^ have tacit knowledge of this coarticulatory 
relationship, as they do of so many others (see Repp, 1982)* The duration of 
[1], on the other hand, does not seem to exhibit such coarticulatory variation 
(Morse et al., 1982; Repp, unpublished data) and therefore may be perceptual- 
ly irrelevant. To examine this hypothesis, the durations of the fricative 
noise and of the lateral resonance were varied independently in the present 
experiments. 

Experiment 1 

Experiment 1 used a slash-splash continuum (from Repp, 198J^: Exp. 7) for 
-^which the average category boundary happened to be around 25 ms of silence, 
similar to the short boundary cjjtained by Marcus (1978). This provided an 
opportunity to test further th§ Jj^^othesis of a lower limit for the perception 
of silence duration in this cdntext. While the reason for the short boundary 
in Marcus's stimuli is not clear, that for the present stimuli was due to 
inclusion of a labial rej^ease burst (from splash ) , which provided an addition- 
al stop manner cue (Repp, 198M). 

Unlike the earlier studies, which used only temporal compression, the 
present experiment introduced both decreases and increases In acoustic segment 
duration./ Although Marcus concluded from his results that the cr^itical silent 
interval was Invariant under changes of speaking rate, he failed to investi-- 
gate the effects of decreases in simulated r^ate (i.e.. Increases in stimulus 
duration) . According to the speaking-rate adjustment hypothesis, the 
perceptual boundary should shift to longer values of silence in that case, 
since no perceptual limit is encountered in that direction. 

The question in Experiment 1 was, then, whether either "[s] duration" or 
"[1] duration, " or both,l have any'e^JC^t on a short-ailence [slj-fspl] bound- 
ary. \ 

Method 

Stimuli . The uttera!\ces slash and splash were recorded by a female 
speaker, low-pass filtered at 9.6 kHz, and digitized at 20 kHz. To avoid 
strong stop manner cuesr in the [s] portion, the fricative noise of slash was 
used in all experimental stimuli. The remainder was taken from splash . This 
portion included an initial 10-ms release burst, which preceded the first 
glottal pulse of the [1] segment* The -end of the [1] resonance was defined 
visually by a change In waveform shape poupled with an amplltiide increase, and 
was confirmed by listening* The durations of the [s] noise and of the [1] 
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resonance were varied independently by either removing or duplicating a piece 
of the waveform. An appropriate piece was selectlw from the interior of each 
acoustic segment on the basis of overall and local envelope considerations^ 
and all cuts were made at zero crossings. In the [s] noise (original dura- 
tion: li|2 ras), the piece removed or duplicated was 51 ms long and ended 36 ms 
before noise offset. In the [1] portion (original duration: 57 ms, or ^^ 
pitch periods), it was 21 ms (5 pitch periods) long and began 28 ms (7 pitch 
periods) after [1] onset. Thus, the signal portions immediately adjacent to 
the closure interval were left undisturbed, so as to avoid changing spectral 
and amplitude envelope cues to, stop manner (cf. Summerfield et al., 1981: 
Exp. 1). The ultimate durations were 91, 1^2, and 193 ms for the [s] noise, 
and 36, 57, and 78 ms for the [1] resonance. (Note that the changes are 
proportional and correspond to increases or decreases of about 36 percent.) 
The orthogonal combination of all [s] and [1] durations resulted in nine sti-- 
muli, for each of which silent closure duration was varied from 0 to 50 ms in 
10--ms steps. The resulting 5^ stimuli were recorded in 5 different randomiza- 
tions with Interstimulus intervals of 2 s. 

Subjects and procedure . Seven paid volunteers and th^author identified 
the stimuli as slash or splash , with stlash as an additional option. The tape 
was repeated once, so that 10 responses per subject were obtained for each 
stimulus. Presentation was over TDH-39 earphones at a*comfortable intensity 
in a quiet room. 

Results and Discussion 



The results ar% shown ir) Table 1 in terms of category boundary locations, 
determined from the average labeling functions by linear interpolation. (Only 
three subjects gave any artlash responses, which were included with splash re-- 
sponses.) Repeated-measures analysis of variance was conducted on individual 
subjects' response percentages,' averaging over silence durations. Increasing 
silence duration, of course, had the expected effect of increasing the 
percentage of splash responses; the labeling functions, which are not 
presented here for the sake of conciseness, were comparable in steepness to 
those obtained by Marcus (1978). As can be seen in Table 1, the amount of 
silence needed to hear a £ (or t) increased as the duration of the [s] noise 
increased, F(2ili|) » 12.5, £ < .001, but decreased as the duration of the [1] 
resonance increased, F(2,1il) - 15.8, £ < .001. Both effects were highly con-- 
sistent across subjects, approximately linear, and of similar magnitude. 
Their interaction was not significant, F(i4,28) 1 . 1 . 

Since increases and decreases in ac'busitic "segment duration effected 
boundary shifts of nearly equal magnitude, it appears that the [sl]-[spl] 
boundary was not close to a lower limit. In fact, the boundary shifted to as 
little as 17 ms of silence in the "short [s], long [1]" condition, which is 
considerably shorter than any of Marcus's (1978) values. This suggests that 
Marcus's failure to find any boundary shifts was not^ue to the relatively 
short category boundary for his stimuli. Indeed, closer inspection of Table 1 
reveals that Marcus's results are replicated by the present study: Due to the 
opposite and equally-raized effects of Changes in [s] and [1] duration, 
simultaneous proportional ooi$resslon or expansion of both acoustic segments 
had no effect on the Csl]^[spl] boundary. (Compare values in italics along 
the major diagohal in Table 1; F(2,1ii) = 0.1.) Tljus, to the extent that the 
combined [33[1] duration conveyed^anything about speaking rate, there was ho 
effect of this variable in the present study. 
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Table 1 



Results of Experiment 1: Average category boundary values (in ms of silence) 
as a function of [s] and [1] durations • 

/ 

/ 

[1] duration (ms) 



36 
57 
78 

Mean 



[3] 


duration 


(ms) 




91 




193 


Mean 


2H,0 


25.5 


27.6 


25.7 


18.8 


23.7 


2^1.2 


227? 


17.2 


17.9 


23.8 


19.6 


20.0 


22. i| 


25.2 


22.5 



The observed effect /of [s] duration on stop manner perception may be 
attributed to the "rate"/ of the speech preceding the silence, which really 
amounts to merely redescriblng the results* An alternative explanation is in 
terms of a perceptual oompensation reflecting listeners^ tacit knowledge^of 
the coarticulatory shortening of [s] fr*ication preceding a stop closure. An 
independent effect of fricative noise duration was also found by Summerfield 
et ak. (19BI: Exp« 1); however, they tentatively attributed it to a 
psychoacoustic effect of this variable on the perceived silence duration. 
This hypothesis cannot be ruled out on the basis of the present data. Howev- 
er, the "coarticulation-compensation" hypothesfs proposed should perhaps be 
favored in view of many related findings (see Repp, 1982) • 

The reversed effect of [1] duration was totally unexpected. Since [1] 
duration in natural speech does not seem to covary with the presence Awrsus 
absence of a preceding [p]. It is unlikely that [1] duration has aniyi^rect 
cue. value for stop manner perception, in the way that [s] duration has. Rath- 
er than affecting stop manner perception directly, [1] .duration may have its 
effect by altering the perceived relative duration of the [s] noise. (See 
Repp, Liberm^n, Eccardt, and Pesetsky, 1978, for a rather similar argument 
relating to the frioative-^affrioate contrast.) In other words, the [s] noise 
ilay "sound longer" before a short [1], and shorter before a long [1]. This 
explanation assumes that - the intervening silence does not engage in such 
contrastive interactions with the surrounding signal portions; this assump- 
tion Is supported by the absence of any effect of increases or decreases in 
both [sj and [1] duration. j / 

Experiment 2 

It is not yet clear why Summerfield et al. (1981) did find an effect of 
overall stimulus oompression. One possibility is that their compr^ession tech- 
nique affected the amplitude envelopes of the signal surrounding the silence, 
thus introducing additional stop manner cues that shortened the amount of 
silence required to hear a jg. However, since their technique was similar to 
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Marcus's, and in fact left about 10 ms of waveform on either side of the 
silence undisturbed, this possibility seems unlikely. Another possibility is 
suggested by the results of Experiment 1, however: The hypothesis just pro- 
posed to explain the effect of [l] duration predicts that the relational 
dependence of perceived [s] duration on the context following the silence 
should decrease with Increasing temporal separation. Thus, at the longer si- 
lent intervals that characterized the Summerfield et al. stimuli, the effect 
of [s3 duration may have been larger than the (presumably) opposite effect of 
the sigrjal duration following the silence, thus leading to a net effect in the 
same direction as that of [s] duration alone. 

It is also true, of course, that Summerfield et al. varied the duration 
of the whole stimulus, and not just of [s] and [1]. It was decided, there- 
fore, to replicate their study using stimuli that had the category boundary at 
a comparably long silent interval (which was achieved by removing the labial 
release burst from the stimuli of Experiment 1 and by shifting the range of 
silent intervals employ(^d). The main difference was that, in Experiment 2, 
the durations of [s], [1], and of the final [aj] portion were varied 
independently, so as to determine their separate effects on the slash - splash 
boundary. 

Method 

Stimuli . The 10-ms release burst was removed from the stimuli of Experi-^ 
ment 1. Two [s] and two [1] durations were employed, corresponding to the 
original and shortened versions of Experiment 1. In addition, t^he final [aej!V 
portion was used both in its original version (^177 ms) and shdPtened by 36 
percent (30^ ms). Shortening was achieved by deleting two separate pieces of 
waveform from th'e interior of the [ae] vowel and one piece from the interior of 
the [/] noise, ' thereby reducing each of these two acoustic segments by the 
same proportional amount. Careful listening indicated no obvious* disruptions 
of spectral -continuity caused by the splices. The two [s] durations, two [1] 
durations, and two [aej] durations were combined to yield eight stimuli that 
were presented with six different silent intervals ranging from 50 to 100 ms 
in 10-ms steps. The resulting 48 stimuli were recorded in five randomizations 
with interstimulus intervals of 2 s. 

Subjects and procedure . Ten paid volunteers listened to the tape twice, 
labeling each stimulus as slash or splash . ^ None of the subjects had 
participated in Experiment 1. The stlash response category was not included, 
since these responses generally occur only at short closure durations 
(cf. Repp, in press). OthS^rwise, the procedure was identical to that in 
Experiment 1 . ' ^ 

Results and Discussion 



The. results are displayed in Table 2. The average labeling functions 
from which the boundaries were derived were less steep than in Experiment 1 
but comparable to those obtained by Summerfield et al. (1981). The boundaries 
were located at somewhat longer silences than in the Summerfield et al. study, 
probably owing to procedural differences. It can be seen in Table 2 that the 
basic findings of Experiment 1 were replicated: The amount of silence needed 
to perceive splash increased as [s] duration increased, F(1,9) =» 42.3, £ < 
.001, and decreased as [1] duration increased, F(1,9) » 20.5, £ < .002. As in 
Experiment 1, these effects were highly consistent and independent of each 
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other (F » 0.0 for their interaction). In contrast to Experiment 1, howeveo, 
and in agreement with the predictions for Experiment 2, the effect of [ft] 
dur^ition was larger than the opposite effect of [1] duration, which '•sypports 
the hypothesis that the latter effect is indirect and decreases with ihcreas- 
ing temporal separation between [s] and [1], relative to the effect of [a] 
•duration. In a separate comparison of the results for the two stimuli that 
differed by a uniform compression of 36 percent (valuea in italics in Table 
2), a significant G.S-^ms boundary shift was observed, £(1,9) ^ 15.2, £ < .00^, 
which is comparable to the shifts found by Suramerfield et al. 



Table 2 

Results of Experiment 2: Average category boundary values (in ms of silence) 
as a function of [s], [1], and [aejj durations. 



[1] duration 


CasJ] duration 


[s] 


duration 


(ms) 






91 


1H2 ^ 


Mean 


36 


304 

m 

Mean 


66.7 
69.1^ 
68.2 


/75.4 
/81.7 
/ 78.6 


73.4 


57 


304 
1177 
Mean 


62.6 
6H.4 
63.5 


75.3 
73.2 
711.3 


68.9 


Mean 




65.9 


76.5 


71 .2 



The effect of the duration of the [eej] portion was less consistent. 
There was a small but significant main effect, £(1,9) - 6.2, £ < .0^, as well 
as a significant interaction with [1] duration, £(1 ,9) - 10.4, £ < .02. As 
can -be seen In Table 2> the effect of [sej] durcition w^s reversed with respect 
to that of [1] duration, longer [aaj] duratj^^ons leading to longer category 
boundaries, except in; the condition where both [s] and [1] were long. While 
the reason for this Interaction is not clear, the direction of the main effect 
suggests that, ^ rather than influencing perceived [s] duration, the [aej] por- 
tion may have modified the perceived [1] ' duration, which then in turn influ-- 
enced the • perceived [s] duration. In other words, there may be a general 
contrastive interaction between adjacent ertergy-carrying acoustic segments of 
the speech signal with respect to their effective temporal features in phonet- 
ic perception. /The effect of [aj] duration (but not that of [1] duration) is 
also consistent with a "contextual speech rate" explanation, but is too small 
to be of any /significance. Clearly, the dominant effect is that of [s] dura- 
tion. / • 



152 -^55 



Repp: [3l]--[3pl] Distinction 



General Discussion 

The present results eliminate the apparent contradiction between the ear- 
lier results of Marcus (1978) and Summerfield et al. (1981), and they also 
rule out some of the interpretations advanced by these authors. They suggest 
that Marcuses failure to find a shift of the [sl']-[spl] boundary as a function 
of stimulus compression was due neither to a perceptual limit, nor to any 
insensi tivi'ty of the boundary to contextual influences. Rather, as Experiment 
1 has shown, even boundaries at very short- silences are highly sensitive to 
context and shift freely to both longer and shorter silences. (See also Repp, 
1983: Exp. i4, for a shift to very short silences induced by a restricted 
stimulus range.) The absence of a net effect of stimulus compression or expan- 
sion when the silence duration is short seems to be due to *the presence of two 
opposite effects, of [s] duration and [1] duration, respectively, which are 
equally strong and thus cancel each other out. Another way of expressing this 
result is that the [s]/[l] duration ratio remains constant at the phonetic 
boundary. On the other hand. Experiment 2 has shown that, when the silence 
durations are longer (as in the study by Summerfield et al.), the [s] duration 
effect is larger than the [1] duration effect, so an effect of overall 
compression is obtained. This overall effect does not seem to reflect an 
adjustment to perceived global speaking rate but may be due to [s] duration 
alone, assuming that [s] duration is perceived relative to the context follow- 
ing the silence. As the t.empo^al separation increases, the influence of this 
context on perceived [s] duration decreases In importance. 

The effect of [s] duration is interpreted here as a perceptual compensa- 
tion for the known reduction in fricative noise duration when it precedes a 
stop consonant closure. Thus it is considered a purely phonetic effect, 
deriving from listeners' tacit knowledge of speech patterns (Repp, 1982). 
This hypothesis predicts that no such effect should be obtained in analogous 
nonspeech stimuli — a prediction that^ obviously should be tested. In a more 
speculative vein, the reversed effect of [1] duration is attributed to some 
form of perceptual contrsist among temporal stimulus properties. It is not 
clear at what level in perception this contrast might arise, but experiments 
with nonspeech stimuli should also prove revealing -in that regard. 

^While the present results disconfirm the hypothesis that the [sl3-[spl3 
boundary shifts as a function of global contextual speech rate, the data are 
compatible with the assumption that listeners compute a variable running esti- 
mate of speaking rate on the basis of local temporal properties of the speech 
signal. In fact, this alternative hypothesis allows for contrastive interac- 
tions among adjacent segments whose relative durations deviate from the ratios 
commonly encountered in natural speech. Accordingly, the effect of [s] dura- 
tion may be attributed to the listener's estimate of the local speaking rate 
at that time, based on [s] duration relative to the following context. While 
this account provides an alternative to the hypothesis of perceptual compensa- 
tion for [s]-stop coartic-ulation, the latter hypothesis is to be preferred be- 
cause speaking rate is not a quantity that varies from segment to segment in 
speech production; hence, to postulate a corresponding, continuously varying 
perceptual dimension is of questionable explanatory value. Moreover, as Mill- 
er, Albel, and Green (1984) have recently shown, perceptual effects of local 
temporal stimulus properties are independent of subjects' estimates of (glo- 
bal) rate of articulation. The only serious alternative to the coarticula- 
tion-compensation account, therefore, is a purely psychoacoustic explanation 
based on ^uditory temporal contrast, which needs to be tested directly in fu- 
ture experiments. 
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Footnote 

'J 

^Subjects' comments' after a brief preview of the tape revealed that most 
stimuli Vere initially perceived as splash . All subjects were consequently 
enco.uraged to try to hear more instances of slash , and to classify ambiguous 
stimuli as belonging to this category. No subject had any difficulty carrying 
out these instructions, which were probably unnecessary, since phonetic bound- 
aries based on closure silence duration are rather sensitive to the range of 
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^' Repp: Csl]-[3Rl] Diatinction 

^ closure durations employed In a test (Repp, 1980). Most likely, the listeners 

in Experiment 2 rapidly adjusted their 
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THE PHYSICS OF CONTROLLED COLLISIONS: A REVERIE ABOUT LOCOMOTION* 



Peter N, Kugler,t M, T, Turvey,tt Claudia Carello,ttt and Robert Shawtttt 



"No fact of behavior, it seems to me, .betrays the weakness of the 
old concept of visual stimuli so ?nuoh as the achieving of contact 
without collision — for example, the fact that a bee can land on a 
flower without blundering into if. The- reason can only be that 
centrifugal flow of the structure /of the bee's optic array specifies 
locomotion ^nd controls the flow yOf locomotor responses" 



"But to understand, to be able /to explain and predict, entails the 
l<nowing of laws. It is our own/fault if we do not know the laws" 

(From Gibson's autobiography iti E. Reed & R, Jones (Eds,), ^ Reasons 
for realism ; , Selected essays of James J, Gibson , Hillsdale, NJ: 
Erlbaum, 1982, pp. 1^ and 15, respectively ."J 



Introduction 

Imagine the following scenario. It is late in the afternoon and since 
early morning you have been mulling over a long-term concern of Gibson's 
(1950, I960, 1961, 1966, 1979), namely, the optical structure ambient to an 
animal that is generated by the layout "^of surfaces and by the animal's move- 
ments (both the movements of its limbs relative to its body and the movements 
of its body, as a unit, relative to the surface layout). You are taken by the 
subtlety of Gibson's point that this optical structure resembles neither the 
surface layout nor the movements but it is specific to them because it is nom- 
ically (lawfully) dependent on them. And you are impressed by Gibson's 
insistence that these dependencies between properties of the animal-environ- 
ment relation and properties of the ambient light are instances of laws, 
indigenous to the ecological scale (the scale of animals and their environ- 
ments), that make possible the control of activity. 



*To appear in W. H. Warren, Jr. & R. E. Shaw (Eds.), Persistence and 
change : Proceedings of the First International Conference on Event 
Perception. Hillsdale, NJ: Erlbaum, in press. 
tCrump Institute for Medical Engineering, tlhiversity of California, Los 
Angel es\ i • 

ttAlso University of Connedtiout. 
tttState University of Hevh^ork at Binghamton, 
ttttUniversity of Connecticut. ' ' 
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Your thoughts return repeatedly to locomotion, Gibson's favorite example, 
and to his characterization of locomotion as a matter of controlled encounters 
(Gibson, 1979) with the substantial surfaces that comprise the objects and 
places of the animal's niche. In the course of locomotlng, an arfimal's 
surfaces may contJkct surfaces of the environment. These contacts are 
selective and they vary in intensity. There are hard contacts (as in predato- 
ry attacks), medium contacts (as in diving into water), soft contacts (as in 
alighting on a branch) and non-contacts (as in steering between trees). It 
seems • to you that it mi:^ht prove helpful to know what happens to bodies, in 
general, when they collide. And to this purpose, you direct youp reading to 
the physics of collisions (auitimarized in the Appendix). ^ 

Your attention begins t^o wander. Looking out the window you see a bird 
in flight (Figure 1). You admire its ability to adjust its flight to the 
surroundings. Your thoughts meander — "laws," "controlled collisions," "a 
physics of the ecological scale." You fall aslejep and dream... 



The Reverie 



You are a physicist investigating a type of visible particle whose iden- 
tity is unknown to yo^u. Particles of this type range in\;;mass from .001 kg to 
10,000 kg. You watph the trajectory of a token particle through a non-uni- 
form, three-dimensipnal surround as depicted in Figure 2. In some regions of 
the surround, matter or energy is more concentrated than in other regions. 
The particle sometimes moves between the particularly dense regions and some- 
times it contacts them. The particle's speed is not uniform. There are obvi- 
ous decelerations and accelerations prior to contact, but th^se- are not uni- 
form either. Sometimes contact is preceded by' a marked deceleration so that 
the contact is gentle— very little momentum is exchanged. Sometimes the 
deceleration prior to contact is hardly noticeable or there is an obvious 
acceleration so that the contact is violent or hard— a great deal of momentum 
is transferred to the contacted region. And sometimes the deceleration is in 
an intermediate range, such that the contact is neither gentle nor especially 
violent. 

Not all of the particularly dense regions of the surround are stationary. 
Some regions mpve just like the particle. Other regions move, but without the 
variations in accelerations that characterize the;particle. ^Basically, the 
particle's trajectory with respect to the moving parts of the surround is not 
different from its trajectory with respect to the stationary parts: there is 
a steering among moving regions and contact— ranging from soft to hard — with 
ipoving regions. 

Repeated observation of the particle's behavior with respect to the sidr- 
round leads you to certain tentative conclusions as to its nature. 

. Conclusion U In tracking the particte'^s behavior, you monitor the 
mechanical quantity of momentum. The rate of change of momentum identifies a 
force or interaction between particle and surround. Usually momentum and its 
first derivative prove sufficient for tfie purpose of describing a given parti- 
cle's, trajectory. Tor the behavior of - t^his particle, however, it seems that 
there is another mechanical quantity that is much more relevant: the second 
derivative of momentum or the rate of change of force. Characteristically, as 
the particle approaches a region of the surround, it exhibits a systematic se- 
quence of accelerative changes. You wish to give this mechanical quantity a 





Figure 2. 



A3 the particle moves through a non-uniform Surround, it sometimes 
steers between dense regions (f and i|) sometimes contacts them 
gently (2) or violently (3)t and does not maintain a uniform speed. 



/ 
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name. "Jitter" comes to mind, but for obvious reasons yoy are attracted to 
"control" and you make note of the control quantity's 'relation to the more fa- 
miliar mechanical quantities of momentum, impulse, arid force (Table 1). 



Table 1 



QUANTITY 

MOMENTUM 
IMPULSE 
FORCE 
CONTROL 



SYMBOL 

P 

I 

C 



COMPOSITION 

MV 
AMV 

. AMV/T 
'AMV/T^ 



DIMENSIONS 



MLT 
MLT 
MLT 

MLT" 



^1 

_3 



[Where^ - mass, V >- velocity, T « time, A - change, L - length.] 



The control of a collision (read in the same sense that one would read 
"the ' momentum of a collision" or "the force of a collision") is, therefore, 
measurable. It would be given by the integration of C within the spatial 
and/or temporal limits of the collision, assuming that they can be reasonably 
approximated. Because of the fact that the mechanical quantity of control is 
a natural extension of the mechanical quantity of force, you are willing to 
speculate that* there is a (scalar) quantity that relates to control in the 
manner that potential (a^term referring to the concentration or distribution 
of a consehared quantity such as energy) relates to force. Ordinary language 
usage suggests the term coord ijiat ion for this quantity. The suggestion is 
fortunate; Both "potential" and "coordination" are conf igurational notions. 
You are tantalized by this idea that the conceptions of control and coordina- 
tion may be interpreted as mechanical quantities that are as principled in 
their relation to one another as are force and potential. 

Conolualon 2; It is evident that while proximity to things in the sur- 
round is a determinant or the forces forming the particle's trajectory, it is 
neither the sole determinant nor the most significant. Conventional partiple 
trajectories are shaped by interactions with regions — usually other partlV 
cles — that attract or repel a particle to varying degrees depending on the 
partiole^s distance from them. A force that is a function only of distance is 
termed "conservative*'' ' The fortes affecting the trajectory^ of your particle 
seem to depend on time (the time to contact) and, perhaps,, velocity (the 
velocity prior to contact). They are non-conservative forces. You guess that 
these for ces"^-* which entail a dissipation rather than a conservation of ener- 
gy—originate in the particle rather than in the surround. There is something 
special about this particle; it seems to have (on board) a replenishable 
source of potential energy that it can deploy. 

Conclusion 3: The number of soft, medium, hard, and non- collisions 
exhibited by your particle during a period of observation is very large. Giv- 
en so many interactions, you think it worthwhile to adopt a statistical 
meohanloal orientation toward the particle* a behavior. It seems particularly 
promising to inquiry about the distribution function that characterizes the 
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. many interactions of the particle and surround, in the tradition of ' Boltzman, 
Maxwell, and zipf you look to the distribution function as a way of appreciate 
ing the constraints— the quantities that must be conserved— on the interac- 
tions of particle-like entities. Relatedly, you see the usefulness V the 
distribution function for classifying particles. Types of interactions will 
be broadly distinguished by the quantities conserved over interactions; these 
differences in conservations will show up as differen6es in distribution func- 
tions given that a distribution function is, completely ' determined by the 
operative conservations.- 

In the construction of a distribution function one asks, roughly, how 
many particles (in any arbitrarily chosen volume) will possess a particular - 
value of a particnlar quantity. Boltzman and Maxwell focussed on gases and 
the property of. velocity. Over the very many interactions of n particles of a 
gas, the conservations of total mass (nrnvj, momentum (nmv,) and vis viva 
(nmvj, or twice the kinetic energy) determine that the particles will tend to 
move at one particular speed, tmr.e or less. Collectively, the conservations 
select ("prefer") a distance Wween collisions (mean free path) and 'a time 
between collisions (mean relaxation time). The mean and variance (the "more 
or less") of the velocity reflect the concentration of the conserved quanti- 
ties. The mean and the variance' of the velocity prove to be characteristics 
of a gas, and both are affected by its temperature. 

( ' ' 

Thinking about your particle in comparison to a gas particle, you are of 
the opinion that the contrast between the two is most sharply drawn with re- 
, spect to momentum change In relation to velocity. Impulses of gas 'particles 
are of maximal frequency when the velocity of the particles is zero, that is, 
at the moment of impact. At any .other moment impulse is nonexistent. 
Statistically, your particle could be .assigned a mean free path and a mean 
relaxation- time but, importantly, across tl^ full r*ange of velocities that it 
exhibits, impulses can be observed. Unlike* the case with gas particles, there 
is the velocity at which the frequency ofi impulses is concentrated. ' . 

You imagine a distribution function defined over three coordinates: num- * 
ber of particles, velocity, and number of impulses. For a typical gas and for^' 
particles of the type you are studying, the distribution functions differ sigr- 
ynificantly. The peaking of impulse frequency at zero velocity that reflects 
the conservations governing the gas will not be found in the distribution 
function bf your particle type. What does the absence of a peak (the fact 
that impulse is uniformly distributW over- velocity) mean? The distribution 
function foi- your type of particle must be the way It is because of the con- 
servations that are operative. This is true by definition. However, the con- 
servations gover«ing your/ particle' s behavior cannot be the typical veloci- 
ty-linked conservations of mass, momentum, and* energy. Governing your parti- 
cle' s behavior are^ x;on3,ervations that are velocity indifferent . 

Conclusion i4: Although you are unable for the present to say much about 
Che selectivity "of the trajectory— the fact that some regionjs function as 
attractors and some as repellers — it is clear to you that the particle's 
trajectory minimizes the momentutq transferred to th^ par):icle from the sur- 
round. To what sort of principle is the particle subject that demands~no "mo- 
mentum bumps? If the particle's interior was^complex and if its persistence 
depended ..on maintaining that interior, then keeping the level of momentum 
absorption below some critical value would clearly be important — large 
transfers of momentum could fracture the particle (see Appendix). At the lev- 
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el of the particle this principle reads: Move to conserve a smooth uni - 

tary process ('smooth' meafning no sudden^energy or momentum bumps — excessive 
energy or momentum exchanges — and 'unitary' meaning that the characteristic 
form and function of the particle is preserved). As a physicist, however, you 
might, be uncomfortable with a conservation that is.(1) defined. at the level of 
the individual particle and (2) not identified with a quantity. The tradi- 
tional conservations of mass, energy and momentum are in reference to measur- 
able physical quantities^ exhibited by the particle. Further, the invariant 
nature of these quantities is not defined at the level of the 'individual 
particle, but minimally at the level of a pair of interacting particles. For 
example, with regard tp momentum, conservation, trie momentum of each of two 
individual particles may change with^a collision but the summed momentum of 
thfe two particles after collision equals the summed momentum of the two parti- 
cles before pollision. - ^ 

Your discomfort with the notion of a conservation of a .smooth, unitary 
procees might be alleviated (but not eliminated) by the observation that some 
of the so-called quantum numbers conserved in the collisions of sub-atomic 
particles denote a -qual i tat ive property — the class of the particle — that is 
invariant at the level of the individual particle. You note how well leptons 
(approximately eight particles that do not take part in "strong" interactions) 
conserve their class membership; accelerating a lepton such as the positron 
to . the point where its mass is equal to that of a proton - (a menpter of the 
baryon class of particles that do take part in "strong" interactions; does not 
result in a metamorphosis. Nevertheless, * you would be happier with a more 
traditional orientation to conservation^ given the size of the particle you 
are studying. You suppose 'that your particle might be a memjp^of a class. 
Is there a conserved quantity defined at the level of the class? For example, 
over the many trajectories of the many members of this, class, perhaps the num- 
ber of members is conserved.* If a quantity such as the latter had to remain 
constant, then the minimization of momentum transfer from surround to panicle 
(and hence the conservation of a smooth, unitary process) would be rational- 
ized. Q 

Conclusion 5: You recognize that a circumstance, such as the one you are 
stu<?ying., in whi'ch forces are shaped to achieve one trajectory and to prevent 
others, usually defines a machine. Somehow a Machine conception must be 
brought to bear on your understanding of the particle. Because a machine ig a 
way of ^harnessing mechanical forces to do 'work in determinate directions, a 
machine can be properly termed a constraint — a restriction on the laws of mo- 
tion. Very often a machine is constructed with hard, resistant pieces linked 
.by hard, resistant chains. Is your particle a hard-molded machine lik^ .this? 
What ^ makes you dubious is that a hard-molded machine is not very fl^ible and 
the particle's trajectory 'indicates that the shaping of force to achieve gen- 
tle, medium, and violent collisions, or to avoid collisions, ia flexible. The 
rate of change in the rate of change of the particle's momentum (iJe., the 
control) varies from region to region of the surround. The unavoidable 
conclusion is that the forces are harnessed by a constraint triat cannot be 
hard-molded. To draw the comparison, you might say that the constraint, on the 
non-conservative f'orpes centered in the particle is "soft" rather than "hard" 
and that the appropriatfe machine conception is soft-molded rather than 
hard-molded. 
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, Conclusion 6: Obliged to avoid action at a distance you make the assump- 
ti6n that the soft constraint on tjie particle - based forces l^s a field. This 
field is ambient to the particle. Is it a field associated wi"th a force, a 
quantum field? Of the four fundamental forces, only the gravitational and 
electromagnetic forces apply, given the magnitude of the particle. The 
electromagnetic field would seem to be a better bet tjian the gravitational, 
but neither is particularly appealing because you are con^rinced that if the 
soft constraint is a field, it cannot be a field associated with a force . Tt 
may well be caused by electromagnetic phenomena but it is qualitatively dif- 
ferent from them.* Your conclusion follows in part from certain distinctions 
drawn by Pattee (1972, f977). Forces and constraints are not things of the 
same type, even though constraints—like all other aspects of nature—are 
built from the four fundamejital forces.-, T^begin with, the forces are not em- 
bodi^ in anything particular and they app'ly to everything within the range to 
which they apply gravity, for instance, applies everywhere). A constraint, 
however, has a particular embodiment and applies- to a particular .thing. 
Further, whereas^he important feature of a force, its magnitude, is dependent 
on rat^ (the derivative of a variable or variables with respect time), the 
important featiire of a constraint, it§ selectivity (resulting in one directed 
motion rather than others), is not dependent on rate. •/ 

Conclusion 7: 11^ is a small step from the preceding conclusion to the 
conclusion that if the field in question is not a- force field,, then the funda- 
mental dimensions from which its relevant variables ^re constructed cannot in- 
clude map (M). That is, the field must be kinematfic— of fundamental dimen- 
sions-length (Lt) and time (T)— or geometric— of furidamental dimension L, but 
it cannot be kinetic— of fundamental dimensions,^, L, and T.' As you have 
already noted, this field must constrain the dissipative forces focused in the 
particle so as^to keep .to a minimuA the momentum transferred to the particle 
from- the surround. You puzzle over this requirement. Doesn't it mean that 
the field in question must te structured by the kinetics of the surround and 
the kinetics of the particle? If the field did., not faithfully reflect these 
two kinetic domains, then there would be no lawful basis for relating forces 
originating in the surround" to forces originating in the particle, and the 
exchange of momentum could not then be regulated. You suppose, therefore, 
that the fielfl in question has this capability and inquire what this tells you 
about the genWal properties of the field.* 

To bring things into focus, you assume (i) the part^^ to be in motion 
at a constant velocity in' one direction and (ii) an absenc^of motion in the 
surrQund. Normally you would represent this by a velocity vector originating 
In the garticle and pointing in the direction of travel. However, you find it 
convenient to think of the field hydrodynamically— as fluid flowing relative 
to. the particle. So instead of assigning a velocity vector to the particle 
(because you regard it as the origin), you assign a velocity vector (the 
negative of the parficle's velocity) to each point in the field, where each 
field. point can be anchored to a surround point* 

.This vector flow field viewed strictly as a kinematic field is always at 
equilibrium; subsequent to a disturbance there is no tendency on the part of 
the field to restore the structure it had prior to the disturbance. FurtherT 
from thfe perspective of the flow field, a disturbance is reversible in that 
any disturbance and its "reverse are energetically equal. This reversible, 
equilibrium character of the flow field is because the flow field is not pay- 
ing the energy cost, so to speak, of its changes. That bill is being paid by 
the kinetic field — the particle — to which the flow field is coupled: Only 
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changes in energy flux can give Vise to changes in flow, and the changes in 
energy flux in this case are bound to the particle's on-board energy 
reservoirs or potentials. 

The reversibility of the flow field appears to be of paramount impor-- 
tance. If the flow, field were not reversible, if it carried potentials that 
"wound-up" the trajectories, then the flow field would itself determine some 
of the properties it exhibits. A reversible field, on the other hand, meets 
the criteria of linearity — superposition and proportionality — and can, there- 
fore, faithfully- map ^ the kinetics that give rise to it. You feel that there 
may well be a very general principle here: The availability of a reversible 
f i^ld is a prerequisite for tjie kind of controlled collisions that your parti - 
cle exhibits with respect to its surround. 



What properties arise in the flow field caused by the particle's motion 
relative to the surround? A coarse analysis reveals the following: kinematic 
properties, consisting of (i) transformations defined over the entire flow 
field — such as outflow from a point and inflow to a point— and (ii) the in - 
verse of the rate of dilation of a topologically closed region of the field; 
and (iii) geometric properties, viz., singulari tij^s , such as foci of outflow 
and inflow.- Global transformations ((i) above) are* specific to the displace- 
ment of the particle as a unit relative to the surroundtjigs (moving forward or 
backward); the inverse of the rate of dilation (ii)"a property you recall 
reading about in the astronomer Hoyle's science-fiction novel The Black Cloud 
(1957) — is specific to the time at which the particle will contact a region on 
its path while the first derivative of this property, which is seen to be a 
dimensionless quantity, is specific to the deceleration of the particle with^^ 
respect to the approach region.* The foci of flow (iii) will be specific to 
the regions, or to the gaps between them, toward which the particle is moving; 
that is, the foci are specific to the direction of the particle's trajectory. 

It is obvious to you that under normal circumstances, the style and/or 
rate of transformation will not be uniform throughout the entire kinematic 
field; rather, there will be discontinuities caused by region boundaries that 
will identify more precisely the relationship between the moving particle and 
a particular lajout of dense regions (depots of mass). For example, within 
the global outflow "local" properties will be revealed, such as: (i) a gain 
of struct^ifre inside a closed contour in the field specifies\an opening in a 
dense region through which the particle could travel, (ii) a loss of structure 
outside a. closed contour in the field specifies an obstacle to the particle's 
current trajectory.^ 

*? • 

Clearly, motion of the particle gives rise to properties that do not ^ 
exist when the particle is immobile. The properties identified above, .both 
"kinematic and . geometric, are annihilated when the temporal dimension goes to 
zero and the ambient kinematic field is reduced to an ambient gebmetric field. 
For examp?^, "streaming" engendered by the particle* s motion condenses out 
geometric, rate-independent points, the singularities, that are not identified 
by a geometric field, analysis. A geometric field analysis at any instant of 
time would not contain the singularities. 

Conclusion 8: You are drawn to the fact that your cursory examination of 
the properties of the kinematic field (caused by the displacement of the • 
particle relative to the surround) revealed a dimensionless number: The first 
derivative of a kinematic field property specifying time-to-\contact . What 
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intrigues you is the possibility of an analogy between the dimensionless quan- 
tities of the kinematic field (assuming that there are more to be discovered) 
and the dimensionless quantities that order a kinetic field, such as a 
hydrodynamic field. 

The transition from one state to a qualitatively distinct state of a phy- 
sical system usually indexes a critical change in the relation between two 
. competing processes. Your favorite example is the transition from laminar 
flow to turbulence, which occurs when the processes (viscous, dissipative, 
'irreversible) that resist fluid motion cannot, in their current organization, 
balance the processes (inertial, conservative, reversible) that sustain fluid 
motion. The dimensionless Reynolds number gives an index of the competition 
between inertial (etc.) and viscous (etc.) processes. High inertial forces 
favor turbulence, with the pronounced internal shearing that that implies. 
High viscous forces prohibit sustained turbulence by damping motions that lead 
to discontinuity (e.g., feddies) and thus ensure laminar flow. The inertial 
ss are governed by Newton's law of inertia and the viscous processes 
governed by the law for shear stress of a Newtonian fluid. The Reynolds 
)er,^ therefore, might be described as indexing the relation between the two 
laws. On' either side of the critical value of the Reynolds number the two 
laws are mutually «)operative, whereas at a critical value one of the two laws 
dominates the other (that is,' a competition occurs). 

You are aware that, as a general rode, any major dimensionless number 
used in physics can be derived directly from the laws known to apply to the 
phenomenon to which the number refers (Schuring, 1977). A dimensionless num- 
ber is often referred to as a Pi number (Buckingham, ' 191^) and when it is 
derivable from one or more laws, it is termed a principal Pi number (Schuring, 
♦^►1977). The important thing you note here is the linkage between physical 
states of affairs that principal Pi numbers index and the facts of critical 
values and behavioral modes (or natural categories). As you see it, the shift 
in balance b'etwe^n two (or more) laws governing a phenomenon from situations 
in which they cooperate to situations in which one law alone is responsible 
can produce categorically distinct states. The transition from cooperation to 
competition between governing laws is tantamount to a natural boundary-making 
device: behavioral modes are created, critical values of one or more vari- 
ables are defined. 

In sum, the critical values of dimensionless quantities in the kinetic 
cases mark off distinct physical states. It does not seem likely to you that 
dimensionless quantities will play this role in the kinematic field of con- 
straint because of the absence of forces — by definition — in the kinematic 
field. But you cannot be too sure, one way or the other. For the present, 
however, it seems prudent to emphasize the specif icational rather than the 
physical nature of the kinematic field. This emphasis raises the question: 
Do dimensionless quantities in the kinematic field mark off — at critical val- 
ues — distinct specif Icational states? 

A soft collision with no momentum exchange between the particle of inter- ' 
est and a nonmoving, dense region on its path requires that the particle de- 
celerate. A deceleration is adequate if and only if the distance it will take 
the particle to stop with that deceleration is less th^n or equal to the 
particle's current distance from the region of upcoming contact. Your calcu- 
lations show that for any particle of the type you are st.udyin£ a deceleration 
is adequate if and only %'f ' • ■ 

• - ' ' 

165 ^ ^ 

167 



4r 



Kugler et al.: The Physics of Controlled Collisions 



> . did) 

Pi(contact) = > -0.5 

dt 

where T(t) is the time-to-contact variable of the kinematic field.' You 
state this result as follows: When less than -0.5. the dimensionless quanti- 
ty. Pi (contact), specifies that the particle will experience a momentum bump 
if present conditions persist; when equal to or greater than -0.5, Pi (con- 
tact) specifies that the particle's contact with the upcoming region will in- 
volve no momentum exchange if present conditions persist. 

You are encouraged by the results, of your analysis. It does seem that 
critical values of dimensionless quantities in the kinematic field distin guish 
qualitatively distinct specif icational states . And it~i^em3 to you 
that the analogy should be pursued further. For example, you might ask: What 
kinds of laws go into the construction of ' pi numbers applicable to the 
kinematic field? 



Conclusion 9: Because the kinematic field ambient to the particle 
constrains its trajectory, you infer that the field and the particle must be 
coupled. This coupling is obviously "soft" rather than "hard^' The question 
to which you now turn is: What must be required of the partil^e and of this 
soft-coupling if the particle is to be constrainable in a way that makes its 
collisions controllable? What must be true of the particle so that it can be 
reliably constrained by the kinematic field? 

It appears to you that there are two important and very general condi- 
tions on the coupling. One condition is that the coupling be linear. What 
would have to be true of the particle's interior in order to guarantee a line- 
ar coupling? The intejg^ior of the particle could be in either a reversible or 
irreversible steady state. If it were reversible, the distribution of con- 
served quantities would be (nearly) uniform and the interior would be 
(approximately) at equilibrium. This nieans that there would be, no problem of 
•connectivity': A disturbance felt by any region of the interior could bfe 
transported to any other region of the interior, however remote. On, the other 
hand, if the .interior's steady state was irreversibl^e, then there would be 
marked and persistent source-sink gradients. As a consequence, a disturbance 
felt In one part of the interior may not be transported to other parts. Cons- 
ervations are not carried up gradients and, conventionally/ it is through the 
transport of conserved quantities that one part of a physical system "informs" 
another part about what it is doing. A loss of connectivity among the regions 
that a-ccoinpanles irreversible steady states means that the overall effects of 
the kinematic field" on the particle's interioi — however those effectls are re- 
alized—could be discontinuous and equivocal. In short, it seems to you that 
if the steady state of the inter for were irreversible and far from equilibri- 
um, then there would not be a constant scfale for laws relating properties of 
the kinematic field to force trajectories cff the particle. You are led to 
assume, therefore, that a linear coupling, which would be both flexible and 
precise, requires a reversible, close-to-^equillbrium steady state. This is 
tantarapunt to assuming that the state space of the particle's force trajecto- 
ries are quasiergodic (that is, no strong preferences or dislikes): The 
particle should not be biased in a way that undercuts the' specif ying capabili- 
ty of the kinematic fijsld.. , ' 
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The other condition on the coupling la that the criterial "smooth and 
unitary process" be upheld. This condition would be met only if the coupling 
involves very little energy (relative to the energy stored and dissipated by 
the particle). A coupling achieved at high energy expense might take too long 
(there would be steep external gradients) or it might involve a large momentum 
exchange and irreversible processes (marked by stress and shock waves). You 
conclude that there must be an energetically cheap translational gate effect- 
ing the coupling of the particle to the kinematic field. • Or, said different- 
ly/ you conclude that the kinematic field is the spatio-temporal structure of 
a low energy field. Your best hunch is-thatthis low energy field- is the 
electromagnetic field modulated by the absorption/emission properties of the 
surround. 

Conclusion 10; Some of your observations of the particle's trajectories 
are especially puzzling. Two of them are depicted in Figures 3 and In one 
observation (Figure 3), you noted that your particle mimicked the trajectory 
of another particle of like kind. The two trajectories were, for a time, cou- 
pled. This coupling of trajectories did not depend on the distance between 
the particles. Sometimes you witnessed the coupling when the particles were 
very close (Figure- 3a). At other times you saw the coupling when the parti- 
cles were separated by a substantial distance (Figure 3b). 

In the other observation (Figure H) you noted that your particle's 
trajectory would follow, without contact, the border of a dense region in the 
surround. Here it seemed th^t there was another temporary coupling—between 
the form of the particle's trajectory and the form 'of a region. 
• » 

Why do you find these observations especially puzzling? It is because, 
as a physicist, you are committed to explaining afiy, coupling (coordination or 
cooperativity ) of one thing with another through conservation principles, ' and 
it .is not immediately obvious to you what the principles are that apply to the 
two couplings depicted in Figures 3 and H. if you. had observed two, more 
conventional particles coupled in interaction, then you would have said that 
(1) some quantity was exchanged between the particles-;^€(J the very le^st mo- 
mentum and energy; and (2) the coupling was an Instrance of coordination or 
cooperativity because the exchange of quantities between the particles J_3 con - 
strained • by the requirement tha t th e s e quantities be conserved ..over the pair 
of particles. You would explain the loss of degrees of freedom that marks an 
Interaction between particles by an appeal to conservational invariants. 

You- feel, therefore, that you have no option but to identify the conser- 
vations that account for the coupling phenomena depicted in F^gubes 3 and H. 
Because the "mimicking" phenomenon is indifferent to parti^ile ^beparation, you 
believe that the conservations in question are unlikely to be Energy or momen- 
tum related. Conventionally, couplings based on energy exchan^^e depend on the 
distance between the particles (i,e., the inverse square law). 

After a good dead of deliberation and hesitation you suggest the follow- 
ing: One of the conservations accounting for phenomena of the type depicted 
in Figures 3 and H must be conservation of topological form . (You believe 
that this conservation is integral to these instances of cooperativity but 
recognize that this conservation alone cannot account for the loss of degrees 
of freedom.) Your use of topological fohm is intuitive rather than techni- 
cal. You m^an, most generally, adjacencies and successitivies — that is, 
neighborhoods in apace and time. And you mean, more particularly, properties 
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Flijure The particle's trajectory follows the border of a dense region of 
the surround without contacting ^t. 
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of' the kind captured in contrasts 3uch as i nner/outer^ , sooner/later, 
lower/higher, closer/ further , slower/faster, larger/smaller, and so on. 
Further, your use of cpnservation is intended to mean that from one "slice" to 
another, of the kinematic field that couples the particle to the surround, the 
topological forrti is constant. This conservation of adjacencies and 
successivities from a location proximate to the soiree to a location distal to 
^the source is made possible by the reversible, equilibrium, low-energy nature 
of the kinematic field. ^ Identifying the two particles in Figure 3 as kinetic 
fields, it is clear that the adjacencies and successivities arising from one 
kinetic field are perfectly conserved over the distance that separates the two 
kinetic fields. The proof is in the adjacencies and successivities arising 
from the second kinetic field (your particle) — they duplicate those arising 
from the first. 

Conclusion 1 1 : A better stab can now be made at the machine conception 
befittin'g the constraining of the forces th^t determine the particle's trajec- 
tory. You have come to the under s'tan^Tng that^hatever the machine concep- 
tion, it cannot apply Just to the particle; rather, it must apply minimally 
to both th^ particle and to the kinematic field that is lawfully generated by 
the surround and the particle's displacement relative to it. * It is very obvi- 
ously true that the particle and the kinematic field are distinguishable. 
They clearly are different materially and, further, the particle, as a source 
of forces/ Is a kinetic field. Given that they ard so different, you are 
puzzled by the principle that relates them as a. single machine. 



Now you are set to thinking: What, after all. Is a machine? Turning to ^ 
examples of hard-molded machines you are struck by the IJact that they are al- 
ways closed kinematic chains, where^a^-ch^Ltn consists of kinematic pairs of 
elements, for example, shaft and bearing, bolt^ and nut, etc. Each element In 
a pair, because of Its resistant material qualities and its form, envelops and 
constrains the other so that all motions except those desired In the mechanism 
are prevented. There Is kinematic clpsure^ You can appreciate why a thought- 
ful student^ of hard-molded machines might say that a machine consists solely 
of elements that correspond, pair wise, reciprocally > Kinematic closure Ts 
the central principle governing the construction of hard-molded machines. 

i 

Two other features ^of hard-molded machines capture your attehtlon. 
First, in a closed pair of elements the* roles of "fixed" and "movable" be 
exchanged (for example, the nut can rotate and translate relative to the tixed 
bolt or the bolt can rotate and translate relative to the fixed nil|^), \hls 
Inversion of roles causes no change in the motion belonging to the pair 'asWou 
show In a sketch (Figure 5). In both of tfie situations shown in your sketch 
the separation b'etween the nut and the head of the bolt is decreasing. Sec- 
ond, although it Is common for a pair of elements to be compleOely closed In 
terms of bodily envelopment, it Is not necessary. The closure that prevents 
certain motions from occurring can be achieved wl thout material structur 5S; 
you note, for example, how vertical downward closing forces keep the wheels/ of 
a train In contact with the rails. 

It occurs -to you that this l*ivarlant characteristic of hard-molded /ma- 
chines — reciprocally constraining, kinematic pairs — may well be an InvaiVlant 
characteristic of all machines. Including the soft--molded machine you are/try- 
ing to^' understand. Are the paired elements of this machine, the particle and 
the field ambient to the- particle, klnematlcally closed? If there Is a 
general Izable .principle of kinematic closure, as you suppose, then the partl- 
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Figure 5. An example of a hard-molded machine. The distance between the nut 
and the head of the bolt can be decreased either by turning the 
bolto relative to the fixed nut as in (a) or turning the nut rela- 
tive to the fixed bolt as in (b)^ 



1 




Figure 6. An example of a kinematically closed soft-raolded machine. The dis- 
tance between the particle and the surround can be decreased either 
by moving the particle relative to the fixed i surround as* in A or 
moving the surround relative to the fixed particle as in B. 
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cle and the ambient field should/pass the inversion test: For example, fixing 
the entire surround and movmg^he particle in one direction should have the 
same consequence as fixing the particle and moving the entire surround iw the 
• opposit'e direction. In Figure 6, situation A should be indistinguishable from 
situation bM 

Your empirical validation proceeds as follows: You note a location where 
the particle frequently comes to rest. <It is natural to assume that this lo- 
cation is a singularity — a stable location of minimal potential energy — in the 
particle'-surround system.) You then arrange matters so that on the next oc- 
casion that the particle is immobile at that location, the entire surround 
moves relative to the particle. You observe that the particle displace? in 
the same direction as the surround. You conclude that the vector flow field 
lawfully generated by the displacement of the surround in direction -^X speci-- 
fies a displacement of the particle from the singularity in direction -X, 
Hence, the particle displaces in direction +X toward the singularity. 

This kind of i<inematic closure differs from the more familiar types. The 
two familiar types you have already remarked upon might 'be labeled (1) 
kinematic closure through resistant bodies and^ (2) kinematic closure through 
forces. The kinematic closure you are now promoting is (3) kinematic closure 
thr^ough specification. The [three types are alike in that the realization of 
any particular motion requir*es that a special relation hold between the paired 
elements. You are convinced that if you were observing your particle on a 
rectilinear trajectory -^towar^d a given region of the surround and you intruded 
on the flow field by some means so as to introduce a prolonged rotational com-- 
ponent into tjie flow field, then the rectilinear trajectory would not be main- 
, tained. To realize any given trajectory of the particle, a symmetry must 
exist' between that trajectory and .the flow field: For the particle to move 
clockwise there roust be a counterclockwise flow; for the particle to move to- 
ward p there must be a flow centered at p, and so on. Although it is very 
clear to you that for your particle and its ambient field this symmetry always 
holds, the point that you wish to underline is that in the absence of this 
symmetry an ^Untended" t^rajectory cannot be satisfied . 

You are absorbed .by what the foregoing reasoning implies, namely, that 
there might well be a similitude for all machines, hardymolded and soft-mold- 
ed. The invariant feature of machines seems to be kinematic closure achieved 
by reciprocal contexts of constraint; kinematic closure seems to be founded 
on a symmetry between the paired elements. To your ,^b>"neyman understanding, 
-^this symmetry reads: There is a transformation T such that if A and B are the 

paired elements, then T(A) > B and T(B) > A. You recognize that this 

transformation T is the mathematical notion of a duality operation and that 
the elements A and B are mathematical duals . You pose the question: What is 
the significance of the dua;^.ity nature of machines? Tentatively you answer 
that if the prerequisite for constraining forces to produce selective, 
determinate motions is a duality structure, then duality mi^st be a symmetry 
property of the most basic kind . * * 

Cbnclusion 12 : In controlled collisions the particle rnUst produce 
changes in force that are commensurate with changes in the kinematic field'. 
Two examples come to mind: (1) to effect a soft collision any f IfiCtuations in 
Pi(cojitact) that carry this quantity below its critical value |hust be coun- 
tered by fluctuations in the control' quantity , C, that are of coflfimens urate am- 
plitude; (2) if the surround is caused to fluctuate, so as to produce oscil-- 

171 

■ • ■ ,■ 17.9 , - r 



Kugler et al.: The Physics of Controlled Collisions 



latory global outflow and inflow of the kinematic field, the particle's posi- 
tion will similarly fluctuate, 180*^ out of phase. The particle's 
commensurate fluctuations are the result of force changes in proportion to 
flow changes. 

Your earlier conclusions, about the conditions of the coupling of particle 
and field are incomplete. They do not identify a principled physical basis 
for force differences that are proportional to flow diff erence^ . When consid- 
ering hydrodynamic flow, you normally visualize a process c::ln which an 
inhomogenei ty in potential gives rise to a force that drives a flow. More 
generally, differences in potential ( AP) gi ve rise to differences in force 
(AF) that, ,in turn, give rise to differences in flow (AV): 

— - AP > AF > AV. 

Flows are proportional to forces, and where the Onsager condition holds, 
sensible deductions can be made in many instances from the macroscopic 
hydrodynamic flow to the irreversible thermodynamics that is its basis. The 
problem your particle poses is different from this conventional problem. It 
reverses the causal path and asks how flows can give rise to proportionate 
forces. Here, the causal vocabulary^ looks strained. But you are aware that^ 
you have felt this strain throughout yoOr analysis. Thus you have spoken of 
the kinetic fields (particle and surround) as causing the kinematic field and 
the kinematic field as specifying and, cognately, constralniTig the kinetic 
field- ^ 

You remind yourself of some basics: Changes in motion or flow per ' se 
cannot cause changes of force; there can be no forces where there are no 
potential differences; the trajectory of force depends on the form of the 
potential. You surmise that JLf a flow is to affect a force it must do'so by 
modifying the potential from which the force is derTvgd. Modulating a poten- 
tial would not necessarily cause a change of force; generally, other condi- 
tions must be satisfied. This reservation is consonant with your observation 
of the influence of the flow field on the particle: only global changes in 
flow lead invariably to changes in force. So, a changte in force may or may 
not occur given a change in flow, but what you are after is a lawful basis for 
these changes whenever they do occur; 

The problem has been refocused: How could a flow af f ect a potential ? 
Formally, a force F is defined as the negative of the potential inhomogenei ty 
or, more precisely, gradient, viz., ^ 

F « -VP, 

where the jjradient symbolized by 7 is a spatial gradient. If P is identified 
as the particle's on-board potential, which is taken to be nearly homogeneous 
(given the arguments you made about the reversible, close- to-equl 1 ibrium 
steady state of the particle — Conclusion 9), th^n you must look to* the 
kinematic field as the source, of the inhomogenei ty; ' that is, as specifying a 
spatial operator , 7. Now, by taking the first derivative pf both sides of the 
above expression for F you get 

dF/dt ^ -^3TO)/dt; 



Kugler et al.: Th6 Physics Of Controlled Collisions 
J / 

that is, control (see fconod^iaioh 1) is given by the rate' of change pf the 
product of the spatial 6^rat9r and the potential. In the foregoing context 
the first derivative or\-VP defines a temporal gradient. As with the spatial 
gradient, you take the temporal gradient to be an operator defined by the 
kinematic field. Assuming commutativi ty the preceding expression for the con- 
trol quantity can be written 



dF/dt - - VdP/dt = -3^P/3X dt. 



where 9X^ is the spatial operator and dt is the temporal operator. In sum, 
the answer to the question of "how could a flow affect a potential?" seems to 
^ require the recognition and understanding of space and time operators on 

potentials. Given that the units of space and time must be in the scale of 
the particle— expressed in terras of the mean free path 6 and the mean relaxa- 
tion time T of the particle's inter ioi; — the control quantity ought to be 
reducible to an expression in P, 6 changes and t changes. 

As a further point, the ordering of potential, force, and flow that you 
are suggesting here is different from that which follows from considerations 
of hydrodynamic flow, namely: ^ 

AV > AP — — > AF. 

t 

It would be prudent, however, to relate the two order ings. You go for the 
most obvious relation; 

AF >AV 

V AP l" 

The flow field (AV) and energy flux (AP > AF) are linked in Vcircu- 

lar causality." You underscore that' these two "paths"^of influence are not 
the same. First, the flux, to flow path is a change in layout (e.g., a flow is ' 
produced when the particle as a unit displaces relative to the layout of the 
surrounding regions) whereas the flow, to flux path is through the translation- 
al gate you identified in Conclusion 9. Second, comparatively speaking, the 
flux to flow path is energetically expensive, whereas the. flow to flux path is 
. energetically cheap (see Conclusion 9). (You resist identifying these paths 
with the cybernetic notions of "forward ^fedi' causality and "backward fed" 
causality. You 'feel that such a move is regressive given that the notions of 
feedforward and feedback imply a referent signal, a comparator and, more gen- 
erally, a separate controller.. The origin and functioning of each of these 
would'have to be rational ized /by physical principles. [As a physicist you 
wish to explain* the phenomenon/ of controlled collisions without the introduc- 
tion of controllers sui generis.] Moreover, you feel that the different 
labeling of the pathways, as forward and backward, while well-motivated in 
artifactual situations', is arbitrary in natural situations.) 

Conclusion ^3: A controlled collision is a physical event in dpace-time. 
*It is, however, by the conventional theory of physical events, a very odd kind 
of event. You struggle to formulate its heterodox quality; A controlled 
oollision is a space/ time ev^nt _in which the final conditions of "a particle's 
■ motions determine the values that the initial conditions must assumed (You 
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had observed; repeatedly, for example, that when the particle softly collided 
^and when it yiolently collided with a region of the surround, its accelerative 
change prior to collision was initiated at two different marginal'' values of 
the time- to-contact property.) This heterodox quality suggests" to you a 
structure of space-time peculiar to controlled collisions, one that is 
explicitly shaped by both initial and final conditions. As a physicist you 
are well aware of the need to be clear on the space--time structure of events. 
Without a prescription for putting space--time boundaries on an event, the 
determination of its causal basis remains very much a guessing game. Within 
what limits should you try to cloge the bookkeeping on the relevant summation- 
al invariants — the conservations? You turn your attention to conventional 
physical event theory to see how well it fares in this regard and to see what 
modifications will be required. 

In the conventional theory, "observer" refers to . the measurement of the 
location of an event in space-time. As a local reference system or inertial 
frame, the ol^server must be perspective free. Measurements must be made 
simultaneously and di stribut i vely throughout a given region of space-time. 
The "observer," therefore, must be capable of existing everywhere in a sp^eci- 
fied region of space-time. Your particle "observes" and "measures" (its 
surroundings and its relation to them). However, given that it is of finite 
size (rather than being infinitely small) and can exist in only one place at 
any one time, it cannot be identified with the observer in orthodox physical 
event theory: Your particle must have a perspective . You suspect that this 
fact will be of Importance in the eventual formulation of the laws of con- 
trolled collisions.^' 

Corollary to the absence of a real or natural perspective in physical 
event theory is the absence of an historical perspective. While the present 
is causally constrained by the immediate past, it is not (to borrow a term 
from Bertrand RuSsell) mnemically conditioned by the distant past. You sketch 
for yourself the Minkowskii diagram (Figure 7)' that illustrates the causal 
light cone that is the traditional domain of physical event theory. (Figure 
7b is a simplified version of Figure 7a, with x,'y, z reduced^ to a single spa- 
tial (s) axi,s.) With the speed , of li^t as the limiting boundary, only those 
events within the same forward light cone can be causally connected to the 
present event at the origin, t»0 (because there ^re no known superluminal sig- 
nals, events outside" the light cone cannot be connected with those inside). 
The events leading up to the present are nowhere represented. The premise of 
the orthodox theory Is that the past is instantiated in the present and that, 
together with the laws of motion, is sufficient to predict or explain event 
outcomes. • The particle you have been studying makes you skeptical of this 
premise. Somehow the /inal conditions must be brought in — explicitly — to 
accommodate controlled cpllisions. 

You try to close in on\ what this would require by producing a series of 
modifications of the MinkowsHdi diagram. First, you include a past light cone 
that converges at t » 0~the Vvent frcMn which the forward or future light tfone 
diverges. In your modif ied ^ketch (Figure 8) you have rotated the axes so 
that time flows from left tolright. Next, you depict four events in your 
sketch (Figure 9)- The events Igj , , and are on the same world line where 
E, is causally constrained by Ej and Ej is causally constrained by . 
take pains to note that the causal constraints are not necessarily transitive 
for these interactional sequences (that is, E, is not necessarily causally 
constrained by E^). This is because E' , which is on a world line 'with E3, 
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Figure 7. (a) The causal light cone determined by time (t) and three spatial 
dimensions, x, y, and z. (b) Tha causal^ light cone where y, and 
z have been reduced to a single spatial axis (s), showing the speed 
of light, c, as the limiting boundary. 




Figure 8. Aomodifted Minkowskii \ilagram rotated so that time flows from left 
to right. It includes a mnemic (past) light cone as well as the 
standard causal (future) light cone. 
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might cancel (or otherwise alter) the effects ofE^. While Ej transacts with 
Ej in the context of E,'3 historical relation to E^, it does not do so in 
terms of the historical context of E^. The rub, as you see it, is that be- 
cause EJ is outside the forward light cone of Ej (It is effectively simultane- 
ous with Ej), its effea*.s cannot be known at Ej and, therefore, E, cannot be 
e^cplained on the basis of E^'s causal cone alone* 

Because unobserVable events may exert an influence on future evaatd, nec- 
essary paths of infVuence cannot be discovered by working forward from initial 
conditions to fin/l conditions. You recognize, however, Jt hat del^^rininant 

ti stories may bey^iacovered by working back from the f inal^conditTions to the 
nitial conditions. All of the' influences on E, are ir|,.lts past or/mnemio 
cone. In sum/the causal future of Ej is only partially Accounted fiC^ by its 
forward cone mat all of the determiners^ of E, are in its rorvemic q^ne. There 
7 between the information def*iVed from history and the informa- 
e to the future. , 



is an asymmet 
tion applicab 



Yeu are 



together. But 



Inclined to believe that the only appropriate framework for con- 



trollecl collisions must be composed of the causal and mnemic perspectives 



Is this framework to be ^n^ ip which these perspectives remain 



asymmetric? Oi^, more accurately, is there a different level of analysis that 
may reveal t/ie symmetry of the event space for controlled collisions? You 
pose thia^ question because of a m?09r l^^^on learned frora^ orthodox physical 
event theory; Putting symmetry at the^ forefront reVeals the structure of 
space-time and fetters the applicatioh of law. Knowing the symmetry that de- 
fines a space-time event means that if one element of an event is known, the 
nature of its symmetric counterpart is ialso known. 

You modify your ske'tch of the Minkowski! diagram once again, this time 
creating a bounded region between the causal and mnemic cones of two succeed- 
ing events (Figure 10). You are now ready to propose a symmetry postulate for 
controlled collisions ; If (i) Ej (approach to a region) and E^ (contact) are 
on the same world line (where E^ is in the causal cone of Ej and E^ is in the 
mnemic cone of E^) and (ii) there are no events outside the causal cone of Ej 
that influence ^ > then Ej and Ej together def^ln^ a new event — call it 
Ep — for which they are dual perspectives. The r>ast rand future cones have 
been merged Into a higher order event space. Events outside the bounded re- 
gion have ndf existence for the particle; they are in neither its history nor 

its future. ' Events inside the bounded region have relative existence. The 

* * 

new event is a controlled collision and it will b6 guaranteed whenever 
the symmetry conditions (i and ii above) hold. 

In a further sketch (Figure 11) you contrast dual events with non-dual 
events. The events Eo and Ej are duals, the events E^ and E, are duals, but 
Ej and Ej are not duals because condition (;li ) is violated (E^ is influenced 
by EJ which is in' the null cone of Ej). What you wish to show in this last 
sketch is that th^e spiBCification of Ej will be ind€(termlnate when based on the 
causal cone* persjpective of E^. Moreover, the seledti^n of. raa^'ginal values at 
Ej to determine ^n "outcome at Ej is not guaranteed to be successful since the 
basis for controlling the outcome at E^ is not completely available at Ej. A 
controlled collision cannot be defined over Ej and E^ because they are not du- 
als • . " c . 
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Figure 9. The causal relationships among four events. Although E, is In the 
causal cone of , it cannot be explained on this ^b^sis alone — E^ 
exerts an Influence on E,-, "yet is in the null cone of (and, there- 
fore, Unknown at) E,. 
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Figure 11 



Eo and Ej are duals (note that E*;, though in the null cone of E^,, 
is not, on a world line with Ej), as are and E, (note that E^^ is 
at the limiting boundary of (and, therefore, is included i-ti) the 
mnen^ic cone of E^), Ej and Ej ^are not duals because E^ influences 
E^ but is in. the null cone of E, • ^ . 




Figure 12. Some must exist that is causally proximal to E|, A change in 
scale reveals the 4uallty over vfhich a controlled collision can be 
defined, ; ^ . 
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To restore or, more accurately, to reveal a duality, you suggest a change 
in scale (Figure 12). At the grain of a finer * space-time mesh there 
necessarily exists some event Ej, causally proximal and dual, to Ej , for, which 
a controlled collision can "he minimally defined. This change in scale merely 
assumes that the particle has limited .sensitivity or acuity to distant events 
on its world line. (In fact, your observations of many particles of varying 
sizes reveal that there is a strong relationship between acuity and size. Th| 
spatial range is a constant proportionality of the vertical magnitude of th^. 
particle.*" "Simply ,put, large particles act with respect to things at a 
greater absolute distance than do small particles.) 

Your point is that for controlled collisions, any events antecedent to 
some future event toward which the particle's current behavior is directed (1) 
must lie within the particle's current causal perspective if they have signif- 
icant effects on the particle's immediate future or (2) must be trivial in 
their effect if they lie undetected in the par^ticle's null cone. Because sig- 
nificant events cannot lie t)utside the bounded region of a controlled colli- 
sion, an appropriate scale of analysis that satisfies this condition must 
exist. You insist that symmetry is the guide to finding this scale: Given 
either the perspective from the initial conditions or the final conditions, 
the other perspective is specified. ( 

■ . . , . •> 

A Summary and an Awakeriin^ - 

, ^ You have discovered quite a iot about your particle, but its identity 
still eludes you. You convince yourself that you have all the information you 
need to identify this type of particle and it is only some firmly entrenched 
bias that prevents you from seeing it. You think that you may have given a 
physical description to the behavior of an entity th^t is" usuaj^ly considered 
to be outside the domain of physics. Several of its properties are like those 
of more standard particles, but you have noticed, they often include less stan- 
dard twists. You review the properties you have <liscovered in the hope that 
highlighting the "twists" might fuel an insight. '(At the very least, it will 
provide a convenient way to summarize these REM episodes.) 

(1) The behavior of your particle can be ..described wi th a measurable 
quantity but this quantity, is control C^MV/T^) rather than the more standard 
momentum (MV). " , ' / 

(2) Forces determine the trajectory of your particle, but they are dis^i- 
pative rather than conservative forces and they originate not in th4 surround ' 
but in the particle. Moreover, the particle can replenish its energy supply. 

(3) The distribution function that you constructed as a means of classi- 
fying your particle reveaja it to. be in a, class whose behavior is not governed 
by velocity-dependent conservations/ . \ ^ 

(^) Your particle exhibits conservation, but it seems Jbo be conservation 
of population number, rather than the more standa«;cl energy or momentum or 
mass. To accomplish this conservation, it appears to minimize mon^entum 
transfers that might fracture the oarticle.'' . * 
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(5) Because your particle harnesses forces to achieve selective trajecto- 
ries, you consider it to be in the class of machines. But its constraints are 
soft-molded, allowing flexibility in the strength of collisions, rather than 
hard-molded. 

(6) The soft constraint on the particle-based forces is a field, but it 
cannot be associated with a force. ' ' 

(7) Because the constraining field Is not a force field, it cannot in- 
clude dimension M and, therefore, is not kinetic; because certain properties 
that are necessary to the control of collisions are annihilated when t goes to 
0, the field must include dimension T and, therefore, is not geometric The 
soft-constraint must be a kinematic hield.^ 

(8) Critical values of dimensionless quantities in the kinematic field 
distinguish between qualitatively distinct states, but these are specifica- 
tional states rather than physical states as would be the case in a kinetic 
field. . 

(9) Because the kinematic field constrains the particle's trajectory, it 
must be coupled somehow to the particle, but the coupling must be linear (so 
that equivocalities ar^ not^ introduced) and low energy (sb^that it does not 
involve large momentum exchanges and irreversible processes). 

(10) You explain the coupling through a conservation, /^t it is of topo- 
logical form (adjacencies and successivities) rather than of energy or momen- 
tum. , " : 

•t 

(11) The .machine conception (identified in Conclusion [4])- must apply 
minimally to the particle and the field as duals, not Just the particle. The 
symmetry is necessary in order to realize and maintain trajectories. 

(12) The flow field produces proportionate forces in the particle, 
presumably by modulating a layout of potentials. Whereas the fact that forces 
produce flows proportionate to the forces is understood, the fact that flows 
produce forces proportionate to. the flows is not. 

/■■ .,•' 

(13) Controlled collisions, which are characteristic of your particle,, 
are physical events, but the structure of space-time is shaped by -final condi- 
tions as well^ as. initial conditions. Where the particle is going colors hbw 
it gets there. ^ 

What la this soft-coupled duality of particle and surround, wherein ■ 
collisions are guided by distinct specif icational states that br*l*»^-4>4nal ; 
conditions to bear on initial conditions, and are controlled by the dissipa- 
tion of the particle's replehi^hable energy resefves in such a way as to 
minimize momentum transfers that could fracturis It? You seem to have de- 
scribed a physics of controlled collisions, but for what... or whom...? 

You are startled awake by the agitated chirping outside your window, ifhe ' 
blr0 is hovering about^ feeder in an effort to replenish its fuel supply, but 
a, <fet has appeared on the scene waiting to replenish itself by effecting a 
violent, predatory collision on your friend. Fortunately for the bird, you 
muse pretentiously, the imminenoe of contact with the cat is specified in the 
optical flow field that links properties of the animal to properties of the , 

. * ■■' , ' *. ■ ■ 
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environment. You marvel, once again, as it guides its flight to avoid the cat 
and locate the food, .cutting its speed. Just in time to alight gently on the 
feeder. Now those are the kinds of controlled encounters tl:iat Gibson wanted 
to understand and that you've been trying to understand. You are suddenly 
overcome with a sense of dftja vu, with a feeling that, at some level, you have 
understood. 






Figure 1-3. The dreamer awakes. 
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Footnotes - . . 

*Iberall (1977) has suggested thpt the number of members of -a biological, 
species is approximately conserved and 'a physics that accommodates biology 
will require the addition of this conservation to the list of conventional 
conservations. 

'Gibson's optic array (1961, 1966, 1979) seems to be a field of this 

'Gibson repeatedly pointed out. that optical rngtion ia altogether differ- 
ent from material motion— that optical motion has*^ no' inertia, (for example. . 
Gibson, 1979). 
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"Properties of this kind were identl^fied by Gibson (1966, 1979) for the 
optical flow field resulting from the locomotion of an animal in a cluttered 
environment. 

'Lee (1976, 1980) identified this property for the condition in which a 
point of observation approaxjhes, or is approached by, a substantial environ- 
mental surface. 

•See Gibson's (1979) discussion of the optical support for the control of 
locomotion. 

'Lee (1976, 198p) performed these calculations and highlighted the sig- 
nificance of the first derivative of the time-to-contact variably. Other op- 
tically defined dimensionless quantities that order (at critical values) spec- 
ificational states have been suggested and experimentally examined by Warren 
(in press). 

•For animals, the photoreceptor processes perform the role of a transla- 
tional gate that involves very little energy relative to the animal's daily 
energy expenditure. * 

'Such as Reuleaux (1963). 

« 

^^Lishman and Lee (1973) 'have shown that In a room where the walls and 
ceiling can move as a unit, displacement of the ro^m- causes a person standing 
in the'room to topple in the direction of the room's mo>etnent. 

**This point has been argued by Shaw and Turvey (1981). 

*^See Lee '(1978). . 

*'For Gibson (1966, 1979) the structure of an optical flow field is ai- 
rways - exterospecif ic and propriospecif ic~i t is always specific to the layout 
and to the observer. - ' 

• • '"Kir^chfleld (1976) reports th^t for animals there is a simple 
first-order relation between visual resolution (R) and body-height (H), 
R =^ k/H, 'Where k is a. constant of proportionality. 

Appendix ' ' ' 

A* Theory of Collisi ons 

The concept of collision refers to forces applied to and removed from an 
object in a very short period of time^. The classical theory of collision, 
based primarily *dn the impulse-momentum law -for rigid bodies, regards the 
colliding objects as single mass points. All elements of each object are as- 
sumed to be rigidly connected and to be subjected instantaneously to one and 
the same change of motion as the result of the collision. In reality, the 
forces^ initiate stress waves that travel at. finite velocity away from the re-^ 
gion of contact and through the object. These waves reflect from boundarils 
of the object and interact ^ with stress waves still being generated at the re- 
gion of contact to create a compley pattern of stresses and strains In the 
interior. In short, all regions of ^ object subjected to a collision are, not 
exposed simultaneously tO' the , same force conditions (Goldsmith, J 960).. 

' ' 103 , • ' ' , * 
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The classical theory is most suited to ideal atomisms whose degrees of 
freedom oiare exhausted by the three axes of J:.ran5lat ion. Atomism is' a term 
suggested by Iberall (1977) for an entity of ^ny magnitude that' is atom-like 
at the scale of the ensemble to which it belongs. It is conventional to say 
that ideal atomisms have no internal degrees of freedom, where "internal" has 
the uncommon meaning, of "extra-translational. " Atomisms of gases such as 
helium are closest to this ideal. They are single atoms each free to move on 
the three spatial dimensions. For all intents and purposes, the total energy 
imparted by collision to a helium atomism may be regarded gQing into the 
translation of the atomism. In ter^ms of the equiparti tion theorem, the energy 
received is divided evenly and completely among the atomism'? degrees of free- 
dom, which are all translational . 

The atomisms of another gas,A9xygen, introduce a measure of internal 
?^mplexity. These atomisms (molecules) consist of two linked atoms. To 
define the position of each of the atoms of oxygen requires three degrees of 
freedom, for a total of six. .However, the linkage -between the atoms 
eliminates a coordinate choice, thereby reducing the degrees of freedom of the 
oxygen atomism to five. Because translation of the oxygen ^atomism's center of 
mass consumes only three of the five degrees of freedom, the two degrees of 
freedom that remain are "internal." The equiparti tion theorem would assign 
three-fifths of the energy of collision to the translation of the atomism and 
two-fifths of the energy to the internal boi;id. Clearly, conservation of ener-- 
gy does not hold if only the energy carried by the translational degrees of 
freedom is taken -into account. It is for this reason that collisions of atom- 
isms with internal degrees of freedom are said to be inelastic a^:id that the 
conservation of momentum (rather than of energy) is the dominant cons^ciint on 
their equations of collision. 

Consideration of the collisions of di-atomic atomisms is a small step to- 
^rfard the collisions of systems . *in a statistical mechanical sense a system is 
an ensemble of intQj^acting atomisms with a boundary that prohibits the ensem- 
ble from dissolving into the^ surround. The atomisms of a system may be interj 
nally barren (like the hfeli urn; atomism) or internally complex (of a kind hinted 
at by the oKygen atomism). As noted, internal complexity is associated with 
ways of absorbing the^^energy^ afpplied to a unitary thing other than through the 
translation of its center of -iwass. 

B. The Theory of Fracture - 

The firt3t majors-advance beyona the classical theory of collisions (vi^^, 
the one-dimensional vibrational treatment of colliding objects)^ recognized' the 
significant proportion 6f eaprSsy^^^J^inve into oscillations wlfien the system's 
natural frequency fs long compd^jed to the duration ■ of contact. Subsequent 
analyses of the multi-^dimensipnal aspect of wave prpp^gation consequent to 
collision^ and of 'the stress distribution at the region of contact, were made 
possible by developments in the theory of elasticity (Timoshenko & Goodier, 
1951). It suffices to say, tdk present purposes, that elasticity refers to 
the fapt that the internally generated forpps of restoration Are comparable to 
the (externally applied forces of deformation so that there 13 ^a return to the 
status quo ante on removal of the external forces. 

In many collisions, however, the conditions of impact are such that the 
entire cross-section of oni3 ^or both of the colliding objects will exhibit a 
final permanent Strain of significant magnitude, or one or^ both of, the objects 
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may fracture. Such non-reversible phenomena result from tThe conversion of 
kinetic energy into permanent distortion or fracturing of the structure of the 
object and the eventual dissipation of this energy in the form of heat. The 
analysis of the irreversible deformations wpought by the propagation of 
stresses that exceed the elastic limit (so called plastic flows or plastic 
waves) is a more recent and less developed aspect of collision theory (Gold- 
smith, i960). ^ 

Evidently, the responses "of an internally complex system to collision 
will be difficult to follow, it is possible, nevertheless, to obtain some 
useful insights into the collision process by considering (a) the behavior of 
a system under statically imposed forces and (b) the relation between impact 
parameters and system failure, ignoring the internal responses. 

The deformation resulting from loading a system statically can be treated 
as a series of equilibrium states requiring no consideration of acceleration 
effects or wave propagations. Of major interest is the response to static 
loading of systems that exhibit a degree of rigidity, that is, systems that 
preserve their form in the face of perturbations. The requirement, of course, 
is that the system be elastic through some range of perturbation. ' Solids have 
an elastic domain as do multiphase systems that are solid or gel in part, s»ch 
as living thrings that are dominated by elastlc-plastic-f luid (liquid and ael) 
processes (Yates, 1982). 

The interior of a solid system can respond in one of three ways to an ap- 
plied force: (1) the linked atomisms 'can be forced further apart or closer 
together than the equilibrium (minimal potential) distance; (2) atomisms can 
hop into. adjacent vacant 'lattice sites; and (3) the bonds between the atom- 
isms can be broken (Freudenthal, 1950; Nadai, 1950; Walton, 1976). If (i ) 
is -sufficient to absorb the energy of loading, the solid is" operating strictly 
within its elastic domain. Suppose that a static loading is rfealized .aa a 
force applied along an axis (a stress) so as to stretch or compress '(more gen- 
erally, to- strain) the _axa|;^;n. Then response (1) means that the system as a 
whole undergoes a coordinate transformation that changes the distanced between 
all the atomisms but not the topology of the systera'^s internal configuration. 
This response to static loading is reversible. It is, however, a response of 
finite capacity. At some ^oint the potential energy stored up within the 
excessively strained bonds reaches a limit (the elastic yield) and new mechan- 
isms for accommodating the applied energy must be found (that is, a new 
"escapement"- arises). One escapement mechanism is the breaking of some bonds 
between some atomisms (response 3) , 'ginother escapemeat is diffusion (response 
2) which is enhanced considerably by the structural ' changes resulting from 
bond- breaking. (In a multiphase system' at the elastic limit ^here is a 
structural change in at least one phase; for example, ;n the' continuous solid 
phase of a two "phase sotid-fluid system such as a gel or in" the more rigid 
phase of a polyphase solid-splid system such as a polycrystalline metal or a 
polyphase solid-fluid phase system such as a high polymer.) 

» 

A brittle system (a physipal ideal'^ an engiJieering myth) 'would fracture 
at the elastic limit. There are no plastic deformations (flow processes) in a 
brittle system and microscopic bond breaking becomes, immediately, macroscopic 
fracture. For real, ductile systems, however, the yield point only identifies 
that loading at which fracturing begins on the atoniistic level. . Once the 
yield point is reached in a ductile system, the mutually reinforcing processes 
of bond breaking and diffusion can continue to^ acc'cmnjodate excessive energy 
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brought in by the static loading. The dissociating of some of the al:omisms 
makes it easier for other bound atomisms to migrate to locations that are more 
stable than the locations that they currently occupy. This flow process is 
irreversibly: Le3s energy is required for an atomism to hop from a high to a 
low potential site than vice versa. However, the coqsequent relaxing of some 
bonds brought about by diffusion increases the strain on other, already 
overstrained, bonds, disposiHfe them^ to 'furt+ier fracture. 

" . ' . . • ^ 

The micro-fracturing that begins at and proc^ds beyond the yield point 
reduces the }ong range order or cooperat i vi ty of the system (interpreted as 
bonds that repeat regularly over many thousands of atomic distances). The 
long range order is replaced by short range order or local cooperatives, not 
unlike the "flow unit" of a liquid. The diffusion occurs at the surface of 
these local clusters because the atomisms located there are thermally less 
stable than their partners in the interior. Clearly, the larger the number of 
local cooperatives and, therefore, internal surfac«^, the greater the diffu- 
sion,^ And cthe .greater the diffusion the more disposed to breaking are the al- 
ready strained bonds at places in the system where diffusion of atomisms is 
not possiblQ^, In sum, fracturing of the bonds between atomisms is a chain 
reaction probess arid eventually a ductile system will fracture at the macro- 
scopic scal^e. 

"The emphasis of the foregoing has been the gradual 'progression of macro- 
scopic fracture, or system failure, .as might occur under the repeated or pro- 
longed application of static forces that exceed the system's elastic limit. 
In the range between the initiation of bond breaking on the microscale and the 
occurrence of system failure on the macroscale, the system gradually loses its 
^bility to absorb the applied energy., A measure of the energy absorption of a 
material is given by its stress {fdrce per unit area)-strain (proportional 
change in length) curve. The energy per unit volume is approximately equal to 
the shaded area of Figure 14. Consequently, the strain energy to failure may 
be approxirpated as follows: energy/unit volume = 1/2 (p ^ e )p . 
Where p is the stress at the yield poirjt and p'' and ^are the ult^maEe 
stress and ultimate strain, respectively, that piarK the collapse of the sys- 
tem. ' ' * ' ' ^ 

Of course^ the \oss of the ability to absorb ;^hergy could be quick, given 
a collisiop. ; The microscopic processes leading to/failure from a single brief 
loading must5)be a* rapid chain reaction of bbr^d breaking associated with elas- 
tic and plastic waves propagating fromthe* point of contact and multiply 
refl6.6ting f rom the system' ^""bmimrary. However, as noted, broad conclusions 
relating failure to the conditions of collision are possible without consider- 
ing the complex of intermediary processes. 

A coll ision will have an acceleration (of the system) X time profile. 
Three. examples of single loadings are given in, Figure 15; to achieve a given 
response amplitude, shorter durations of loading must be compensated by great- 
er accelerations. .Two parameters are of special 'Significance: the change in 
velocity and the average acceleration (in units of gravity) " that Is J us t 
sufficient to produce structural failure . In Figure 15 the cross-hatched 
areas express the velocity changes. The average acceleration of any c/bllision 
is equal to the velocity change divided by duration. A^ool.lision •sensitivity 
curve can be generated by plotting criteria! velocity change (where fracture 
occupy) against criterial average acceleration (where -fracture occurs) 
(Kornhaus^r* ^96^). A prototypical collision sensitivity plot for a prototyp- 
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Figure 14. The energy absorption per uniit volume of a material is given by the 
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Figure 15*-* Acceleration x time prof iles ' of collisions .under three loading 
durations (after Kornhauser, 1964). 
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Figure 16. Collision sensitivity plot shows where system failure will occur 




(after Kornhauser, 196^1). 



ical system is given in Figure 16. The vertical asymptote is related to, 
acceleration pulses that are steady or of long duration. It implies that no 
failure occurs unless certain average acceleration is exceeded, regardless 
of the change in velocity of the System and the duration of the collision. 
The horizontal asymptote is related to acceleration pulses of short duration. 
It implies that system faHure does not occur unless a certain, ve^locity change 
Is exceeded regaV'dless of the average acceleration value (Kornhat^iaer, 196^). 

The locatl'on of the vertical asymptote in Figure 16 is a fiihction of the 
shape of the collision (its acceleration X time profile). In contrast, the 
horizontal asymptote is independent of the shape of the loading and is fully 
characterized by a unique, value of ^ velocity change: Collision durations that 
are short enough to be on the short , duration asymptote (marked by" (I) and (II) 
for -''a given system will result in the structural failure of that system. 
There is some evidence (Kornhauser, 196^) to suggest that the collision 
velocity change required for irreversible damage to mammals is relatively 
indifferent to specie^ and size (25 feet per second is a reasortable approxima- 
'tion). The criterial average acceleration, however, differs markedly with 
species and size (roughly, ,20 g for man and 650 g for mice). 

A simple rule of thumb relates the criterial velocity change (V J and 
criterial average acceleration (G ) to the system's natural frequencyv (w) 
(Kornhauser, 196^): ^ 

°o-"Vo- 190 - ■ . 
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If most collisions between systems- and their surrounds are of sufficiently 
short duration to place the systems on the horizontal asymptote of their 
colMsion sensitivity function, then is constant. (Fpr mammals, as ^oted 
above, V - 25 f/s.) In other words, the higher the value of a sy^t^lirs 
natural frequency, the greater is the system's tolerance to collision (mea- 
sured in multiples of the gravitational constant). 
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ON THE PERCEPTION OF INTONATION FROM SINUSOIDAL SENTENCES* 
Robert E. Reraezt and Philip E. Rubin 



Abstract . Listeners can perceive the phonetic value of sinusoidal 
imitations of speech. These tonal replicas are made by setting 
time-varying sinusoids equal in frequency and amplitude to the 
computed peaks of th^ first three formants of natural utterances. 
Like formant frequencies, the three sinusoids composing the tonal 
signal are not necessarily related harmonically, and therefore are 
unlikely to possess a common fundamental frequency. Moreover, none 
of the tones falls wl^hin the frequency range typical of the funda- 
mental frequency of phonation of the natural utterances upon which 
sinusoidal signals are based. Naive subjects nevertheless report 
that Intelligible tonal replicas of sentence^ exhibit unusual "vo- 
j cal" pitch variation, or intonation. ';Our present study attempted to 
determine the acoustic basis for this apparent intonation of sinus- 
oidal signals by employing several tests of perceived similarity. 
Listeners judged the tone corresponding \o th6 first formant to be 
more like the intonation pattern of a sinusoidal sentence than ei- 
ther: (A) a tone corresponding to the second or to" the third for- ^ 
mant; (B) a tone -presenting the computed missing fundamental of the 
three tones: or, (C) a tone following a plausible fundamental fre- 
quency contour generated Trom the amplitude envelope of the signal. 
Additionally, the tQ|Te reproducing the first formant pattern was 
rjesponsible' for appa^ent intonation even when it occurred in 
conjunction with a lower tone representing the fundamental frequency 
pattern of -the natural, utterance on which the replica was modeled. 
The effects were not contlgent on relative tone amplitude within the 
sentence replica. The case of sinusoidal sentence "pitch" resembles 
the phenomenon 6f dominance , that is, the'general salience of wave- 
form periodicity in the region of iJOO-1000 Hz for perception of the 
pitch of complex signals. \^ 

\ ^ Introduction 

A number of recent studies of speech perception have examined t,he effects 
of sinusoidal replicatipn of speech signals (Bailey, Summerfleld, & porman, 
1977; Best, Morrongieilo, & Robson, 1981; Grunke & Pisoni, 1982; Schwab, 
1981). Typically, suOh tonal analogs of speech are composed of three 

,^ 

*Also Perception & Psychophysics , 198^1, J5^, ^29->\i\0. 
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time-varying sinusoids, each tonei reprpducing the- frequency and amplitude 
variation, sometimes schematically, of a.formant from a natural utterance. In 
such acoustic patterns, devoid of the harmonic series and broadband formant 
structure of natural speech, -thq^ short-time acoustic properties are unmistak- 
ably not speechlike. Both acoustically and perceptually, sinusoidal signals 
are grossly unnatural, and naive listeners tend therefore t,p perceive sinus- 
oidal sentences merely as several covarying tones unless they expect to hear a 
linguistic message; moreover, phonetic perception, fails to occur unless the 
tonal stimulus is adequately structured, indicating that an explanation of- 
this effe^ct should be sought in terms of the information provided by thesj 
atypical . stimuli (Remez, Rubin, Pisonl, & Carrell, 1981). . When sinusoidal 
patterns are perceived phonetically, they are Judged -to be intelJLigible yet 
unspeechlike, presumably because they conyey segmental information in an' ab-* 
stract pattern of spectrun}. variation . With almost none of the typical acoustic 
detailis of natural speech. 

One consequence -ojf this finding is* methodological. \ This technique for 
transforming the signal can be used to reveal the' perceptual signif icaYice of 
time-variation in the speech stream. This is so precisely because such 
unspeechlike signals disentangle the pattern of* frequency variation over time 
in the speech stream from the sequence of particular ipomentary' acoustic ele- 
ments that ar^ produced by vocal articulation; ^ In view of the acoustic 
differences betweeen sinusoidal signals and the natural utterances that they 
replicate, it seems fairl^o suppose that sinusoidal repl icj^t^^on dbes not mere- 
ly reduce the amount of information present in, the.signa^L, as minimal-cue 
speech synthesis- does (fdr example, Delattre, /Liberman, & Cooper, 1955; and 
Abramson & Lisker, 1965). ^ In that techniq|(ie, a subset of the acoustic 
ingredients of an utterance is selected for imitating synthetically. Obvious- 
ly, the information provided by natural acoustic, elements is lost if those 
elements fail to appear in the synthetic replica.^ In sudh circumstances, 
phonetic information may or may not" be adequately conveyed by the remaining 
acoustic structure.^ Therefore, thig minimalist method is designed to reveal 
the effectiveness -of particular- acoustic elements — for example, a burst of 
noise, a low frequency- murmur, or a. pcescribed' frequency transition in the 
second formant — when others' have been neutralized or eliminated. 

In contrast, the transformation of a speech signal into time-varying 
sinusoids-' does not preserve particular consti tuer/ts of the acoustic • signal 
while discarding others.* Rather, it destroys "the physical similarity between 
acoustic moments, in natural speech and those in sinusoidal patterns. The 
"residual similarity between^ speech and sinusoidal imitations is to be found 
only in the variation of the two kinds of signal, and specifically in the pat- 
tern of frequency variation over time. For this reason, a significant aspect 
of tl\e sirHispidal replication technique would be obscured by classifying the 
signals simply as "impoverished stimuli." They are, in fact', literal limita- 
tions of the time-varying properties of the supralaryngeal- vocal-tract reso- 
nances. Sinusoidal signals of this type present the pattern of resopance « cen- 
ter-frequency variation through an utterance,* although the signals obviously 
do not contain formant structure.* Our tests (Remez et> al., 1981) l:iave estab- 
lished the sufficiency of this acoustic abstraction of the speech* signal , in 
contrast .to research that more customarily demonstrates the perceptual uses of 
selected brief pieces of the signal. When perceivers detect phonetic struc- 
ture in 'sinusoidal patterns, this revealjs the ysefulness of the forms of stim- 
ulus change as phonetic information, and the independence of perception from 
most-^of the specific acoustic details with which* the forms are conveyed. 
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In an obvious way, however,- sinusoidal' replicas of speech are impover- 
ished; despite all. The principal perceptual correlate of sentence intona- 
tion, the fundamental frequency of phonation (Liebjerman, 1967), ia absent from 
sinusoidal signals, which imitate only the . frequency variation of the firmant 
peaks. As a result of- this deficiency, listeners have consistently reported 
that sinusoidal sentences exhibit noticeably weird patterns of intonation. 
Th^e perception of relative syllable stress. (Fry, 1958; Lehiste & Peterson, 
1959; Morton & "Jassem, 1965), or of the placement of clause, boundaries 
(Collier &, t'Hart, 1^75; ' Lehiste, 1973; Streeter, 1978-), each of which is 
said to follow occasionally from normal intonation, must therefore be support- 
ed (if ati allX by other, means because the.-,^omalous intonatidn of sinusoidal 
repiicas is quite different from the normal inWnation patterns to which these 
.roles, are attributed^ To the same extent that the fundamental frequency of, an 
utterance also . contributes segmental information (about consonant voicing 
[Summerfield & Haggard, 1977] or vowel idehtity [House & Fairbanks, 19531, for 
example), the listener, will also be forced to rely on other, alternative 
sources. 

But why do sinusoidal signals create this impression of peculiar intona- 
tion in ti\e first place? Prosodic perception is an admittedly complex affair^ 
in which the, properties of a single piece of. the acoustic stream may affect 
the recognition of segmental, syllabic, and syntactic structural properties 
togetherv in the sinusoidal case, it seems that the pattern of tones imitat- 
ing only the formant variation inadvertently presents an effective stimulus 
for perceiving intonation. It. is far from obvious why three tones in the fre- 
quency range of formants should lead to this impression of vocal pitch, for 
the acoustic properties corresponding to intonation typically occur several 
octaves below the lowest formant, and, consequently, below the lowest frequen- 
cy tone in our "three-tone .patterns. We undertook the present study to identi- 
fy the acoustic and perceptual basis for this peculiar concomitant of phonetic 
perception with sinusoidal signals^ The first experiment described here' de- 
termined 'Which of the likely acoustic sources for the anomalous intonation 
would in fact' be identified as the correlate of sinusoidal intonation. The 
secoad experiment tested the salience of the empiri-cally determined acoustic 
correlate of sinusoidal intonation, tHe tone reproducing the pattern of the 
first formant (Tone 1), as a function of its relative amplitude in the 
three-tone pattern. The third experiment revealed^ that subjects did not hear 
the intonation of a sinusoidal sentence as' the correlate of Tone 1 when that 
tone^was removed from the siijusoidal sentence pattern. Finally, the fourth 
experiment that we describe found that the intonation of a four-tone pattern, 
composed of three sinusoids imitating formant variation and a fourth' imitating 
fundamental frequency variation, was agai-n correlated with the fir^t formant 
♦tone and not with, the lowest frequency tone of the pattern, complementing the 
resalts of the first three studies. 

* ExpeAment 1 

From ^the outset, there seemed to be at least three potential causes of 
the perceptual impressiort that sinusgic^l replicas of , natural utterances 
possess "odd" intonation. First, t.he apparent speech melody mdy be the 
listener's invention, given that the structure of the sinusoii^al signal is 
defective precisely in representing the fundamental frequency of the original 
utterance. Typically synthetic speech, on the other hand, is generated wl-th a 
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fundamental f-requency pattern as well as a sequence of spectrum envelopes 
approKimat ing the natural case*' In the sinusoidal instance, Uhe listener may 
fabricate an intonation pattern from the variation in the amplitude envelope* 
of the signal, which is correlated with variation in fundamental frequency in 
the natural case (Lieberman, 1967), and which also ia, represented faithfully 
in sinusoidal replications of natural utterances. 

Se6ond, the listener may induce a pitch contour based on whatever chang- 
ing hctrmonic relationships exist among the three tones of the sinusoidal pat- 
tern. The three tones are not likely to be related harmonically at any given 
instant, because they follow' the computed resonance peaks and not theTrequen*- 
cies of the harmonics' of the fundamental closest to the formant centers. 
Nonetheless, there may be a kind of auditory induction occurring, based on the 
varying f^elation of the frequencies of the three simultaneous tones, that 
produces a time-varying residue heard as the intonation contour. This possi- 
Ijility would be similar to the induction of the missing funclamental (Licklid-- 
-er, 1956; Schouten, 19^0). '* ^ ' 

\ 

J Th^rd, the listener may use one of ^e three tones both for segmental 
information and For in^nation information. Although the principal 'acoustic 
correlate of sentence intonation is the fundamental frequency, and although 
the fundamental frequency is preservt in the speech spectrum *at an average of 
two octaves below the first formant, both psychophysical and elfectrophysi61og- 
ical evidence suggest that listeners may detect the fundamental frequency of 
natural utterances^ by attending to the periodicity of the harmonics of the, 
fundamental in the vicinity of the first formant (Greenberg,. .1980). If an 
extrapralation of those findings is appropriate to the fiinusoidal case, we 
would expect the apparent Intonation to be based on the pitch of the tone 
replicating the first formant of "the natural utterance on which it is modeled. 

To determine^'the basis for the apparent intonation of sinusoidal sen- 
tences, we performed a test »Qf f the ' apparent simij.ar^ity of pitch contours, in 
which gub jectsfcjudged one'member of a pair of tone patterns as^ more like the 
speech melody of a sinusoidal sentence. The set of candidate intdnation pat- 
terns included each of the three tones of the sin^isoidal sentence pattern 
presented indivl^dually, a plausible fundamental, frequency pattern derived from 
the amplitude envelope of the sinusoidal sentence,* and a tone that reproduced 
the pattern derived by computing the greatest common divisor of the three 
tones at intervals of 10 ms throughout the^ sentence. On each trial, the sub- 
ject was 'asked to identify the^ sentence melody of a three-component, sinusoidal 
sentence presented once, and then to select the single-tone pattern from the 
two alternatives that was more like the melody of the sentence. ' 

Method ' • • . . 

Subjects . 'Fifteen adults with normal hearing in both ears were recruited 
by hapdbill ^from the combined populations of Barnard and, Colqmbia Colleges. 
All were native spofakers of English, and none had participated in other\ 
experiment's ^employing sinusoidal signals. Subjects were paid Tor their ser- 
vices. . ^ * ^ ' ' > 

Stimuli . The acoustic materials used in this test consisted^cff six si- 
nusoidal patterns--one three-tone sentence pattern and five slngle^tone pat- 
terns — 'i)roduCed by the sinewave synthesizer at Haslcins Laboratories. This^ 
software synthesizer i^enerates sinusoidal patterns defined by parameters of 
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frequencV and amplitude for each tone, updated, in this case, at t^he rate of 
10 ms pbr parameter 'frame. The initial syntheais parameters were obtained by 
analyzing a natural utterance, the sentence "Where were you a year ago?" pro- 
duced by one of the authors. 'This utterance was recorded on audiotape in a 
sound-attenuating, chamber • and converted tt> digital records by a VAX 
11/780-based pulse-code modulation system using a 5 kHz low-pass filter on in- 
put and a sampling rate of 10 kHz, At 10 ms intervals, center-frequency and 
amplitude values were determined Jor each of the .three lowest Oermants in this 
utterance by the analysis technique of Hnear prediction. In turn,* these val- 
ues were-used as sinewave . synthesis parameters after correcting ^the errors 
that linear prediction is prone to commit- , Generally, inappropriate Values 
are easy to identify in the^ parameter* table. They are likeliest to be found 
when the formant extraction routines are unable to identify any ^plitude 
peaks in the spectrum, for example, when amplitude is low due to consonant 
closures. Formant patterns are also corrected If the analysis designates an 
extraneous "formant," whlx^h displaces the prope^r values to the next highest or 
lowest formant, for example, during rapird spectrum change. A full description 
of sinusoida1> replication of natural speech is provided by^Remez, Rubin, and 
Carrell (198I ) • • ' . 

The sentence pattern that was matched to the natural utterance was com- 
posed of three time-varying sinusoids. Tone 1 corresponded to the first for- 
mant. Tone 2 to the se&ond, and Tone 3 to the third. A^Fourier spectrum for a 
section of the three-tone pattern i3^sh6wn in Figure 1 A\ Note that, the rela- 
tive energy in the three tones decreases with increasing frequency, imitating 
the natural case, but that the broadband formant and harmonic structure common 
to voiced speech is not present. The five alternative- single-tone patterns 
that were used to compose the pairs of alternatives were: Tone I, Tone 2, or 
Tone 3, each a component of the sentence pattern that the subject heard at the 
beginning of each trial; a plausible fundamental frequency pattern <PF(!)) 
computed from the amplitude envelope; and the p-attern cpmpriaing the values 
of the greatest -common divisors (GCD) of the three concurrently varying tones 
in the replica of the natural utterance, computed for each 10 ms frame of the 
sinusoidal- synthesis parameters. Each of the single' tone alternatives was 
produced with eq^al average power. The PFO was derived by modulating the fre- 
quency of a 100 Hz -tone to follow the changes in amplitude ©f the waveform of 
the sinusoidal sentence. The maximum range -pf this tone was 20 Hz, and the 
maximum rate of frequency change was 1 Hz/10 ms. Finally, the frequency val- 
ues for*«synthesizing the "missing fundamental" ^tone were determined by comput- 
ing the integer, for each synthesis frame, of greatest value that served as a 
divisor for each tone frequency, with no more th^n a 2% remainder. The aver- 
age . frequency value of this plausible missing fundamental tone was 92 Hz, well 
within the fundamental range of the talker producing the original utterance 
from whfch' these six tonal patterns were derived. The amplitude values of 
this tone Were matched for each 10 ms frame to the amplitude values of Tone 1, 
A graphic representation of each of the five single-tone patterns is presented 
in- Figure IB . 

. .. . ^ -- • « • . . , 

The synthesized test materials were ccTnverted from digital records to 
analog sigrfals, recorded on audiotape at Hasklns Laboratories, and were 
presented to^ listeners in the Perception Laboratory of 4he Department of 
Psychology, Barnard ' College, by playback of the audiotape. Average signal 
levels were set at 72 dB SPL, Stimuli were delivered binaurally in an acoust- 
ically 'shielded room over Telephonies TDH-39 headsets. 
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Figure 1. (A) This panel ^3 the Fourier spectrum of a representative section 
through the threes-tone pattern replicating the sentence "Where were 
you a year, ago?" (B) The five tone patterns used as stimuli in 
Experiment 1. * ^Top panel: The ,three--tone pattern replicating the 
first three formant center-frequency values of the sfentence "Where 
were you a year ago?" Middle panel: The pattern composed of the 
greatest common divisora (GCD) of the three tones in ,the sentence 
replica, computed for each lO-ms synthesis frame. Bottom panel: A 
plausible fundamental frequehcy pattern -fPPO), computed from the 
amplitude envelope or the sentence pattern. In all cases, M^aria- 
tion *in thickness represents amplitude variation. 
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Procedure , Li^tjS^^rs Were instructed that the experiment was examining 
the identif iabiiity vocal pitch, the tune-like quality, of synthetic sen- 
tences. To illustrate the independen<ie of phonetic structure and sentence 
melody, the experimenter sang the phrases "My Country 'Tis of Thee" and "I 
Could Have Danced All Night"\with the original melodies and with the melodies 
interposed. Wljen subjects acknowledged they could determine the melody of a 
sentence regardless of the words, th^y were instructed a) to attend on each 
test trial to the pitch changes of the sinusoidal sentence, b) to identify the 
pattern, and c) to select which of the two patterns more closely resembled the 
pitch of the* sentence. Subjects recorded their choices in specially prepared 
response booklets. \ 

Ik ^ 
Each trial had the same format. First, the sinusoidal sentence "Where 
were you a year ago?" was presented- once. Then, one of the five single-tone 
Pcitterns was presented. Finally, a second single-tone pattern was presented. 
There were ten different comparisons among the five different single-tone al- 
ternatives. Counterbalanced • for order, each subject Judged each different, 
comparison ten times'. Each sinusoidal pattern was approximately 1400 ms in 
duration; the interval between items within a trial was 1 s; -and, the silent 
interval between trials was 3 s. 

' Results and Discussion 

An analysis of variance was' used to identify the differences amoilg the 
means of subjects* performance in the differential similarity test. 
Irrespective of the or:der of alternatives within a triral,"^ there were ten dif- 
/erent tria-1 types comparing tonal alternatives: Tone 1 vs. Tone 2; Tone 1 
vs. Tone 3; Tone 1 vs. PFO; Tone 1 vs. CCD; Tone 2 vs. Tone; 3; Tone 2 vs. 
PFO; Tone 2 vs. CCD; Tone 3 vs. PFO; Tone 3 vs. GCD;^ and PFO vs. CCD. For 
each type of trial, a signed value indexing the preference for one alternative 
or the other was computed by* taking the difference of the number of trials, 
(out of ten) on which the subject selected the first alternative versus the 
second. (The order of alternatives used to determine the sign of, the differ- 
ence was the order of €he alternatives given diregti/ above.% Note that if 
the subject had no cohsistent preference within ^ trial type^ the*index val^ue 
approached 0, while a consistent preference apprbached ' ( + or -) 10. The 
'one-way analysis of 'variance revealed a significant difference among the means, 
of the similarity scores on different trial types, F(9,126) = 11.8, £ < .001. 
Scheffe post hoc means tests showed that Tone 1 was preferiped to every alter- 
native with which it was compared,^ but that in trials excluding Tone 1 the 
greatest performance difference was not significant, histograms of ..the group 
data are shown in Figure 2. The figure represents the proportion oT'trials on 
which each alternatixve in each comparison type was gelected. From thi^ figure 
it seems clji&ar that the, tone replicating the first formant is choseri as the 
sentence pitch in any comparison that includes it (2A), and that in every oth- 
er case the choice of tone is equivocal (2B). * , ^ - 

The outcome supports a. few conclusions about tjie cause of the odd intona- 
tion of sinusoidal sentences. It seems that the tope that" replfcates the 
first formant of the natural .utterance is put to two uses, perceptually, by 
li:?teners. Aithc^gh it seems to provide segmental information about, conso- 
nants and vowels, li^s we expected', it also serves as the acoustic correlate of 
sentence pitch, a; function uf^ually attributed to the fundamental frequency of 
phonation. This outcome seems surprising because Tone 1 in sinu^^'oidal sen- 
tences is typically one and one-half octaves higher than the fundamental, as 
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Figure 2. Differential similarity data from Experiment 1. (A) Comparisons in 
which Tone 1 was an alternative^ (B) Remaining comparisons. 
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is the first formant in natural utterances. Moreover, Tone 1 , would be quite 
b^eyond ' th6 comfortable phonatory range of adults capable of producing the 
associated formant frequencies. The perceptual preference for Tone 1 as th^ 
intonation of the sentences is not to be expected, therefore, if perception is 
based primarily on the listener's knowledge of the normative articulatqry 
abilities of talkers. Although some evidence implies that^th^e variation in 
fundamental frequency in natural speech is correlafted with formant -pattern 
(House & Fairbanks, 1953; Uehiste Petej^son, 1961); no current proposal also 
suggests that the perceiver uses the/rirst formant frequency variation as 
infQrmation both for intonation and for segment identification. This, howev- 
er, seems to have occurred In the case of sinusoids. 

Researdh on the phenomenon of the dominance region (for example, Plomp, 
1967; Ritsma, 1967; see also Greenberg, 1980) may begin to explain this rer 
suit. These studies established that tjie impression of pitch corresponds to 
the shared fundamental period of, the third through fifth h^monics, and not to 
the periodicity of excitation occurring in the lower or higher frequencies. 
In the nonspeech case (Plomp, 1967), listeners judged the apparent pitch of 
, complex signals composed simultaneously of two different harmohic series. 
E^ch series presumably could have led to the impression of a different pitch, 
but the series falling within the "dominant" region in ' fact deter^nined the 
pitch. In the speech case, Greenberg .(1980) recorded evoked potentials to 
synthetic vowels in human subjects. He found that the auditory representation 
of fundamental frequency was strongest when the first formant occurred within 
the dominant region. If the impression of pitch is obtained from analysis of 
this band ^in the auditory repre;3entation, then the implication of this work is 
clear: A perscs^n listening to speech normally uses the region of the spectrum 
associated with the first formant to obtain periodicity information as well as 
to detect the frequency of the first formant^ itself. Ordinarily, the 
periodicity of the stimulus in this regidh and the frequency of the first for- 
mant will differ, although in the present case they happen to be identical.. 

We cannot beT sure, however, that Tone 1 'is selected for its prosodic role 
for any reason' other than it is the loudest tone in the three-tone comply. 
Recall that the parameters specified for each time-varyin| sinusoid in the 
reg^J^ation of . the natural utterance include a formant > center f requency 'anXl a 
formant amplitude specification, both derived by linear prediction analysis of 
the speech waveform. Because the first formant in natural speech commonly has 
tY\fi greatest energy and each higher formant less energy, this spectrum 
envelope rolloff is therefore preserved l,n the sinusoidal imitation. To 
identify the\ relation between the selection of Tone 1 as the pitch contour of 
the sirlUsoidJli sentence and its relatively great acoustic power, we "performed 
Experiment 2.V Ip addition, we attenjpted to test the generality of our finding 
by using a new sentence. 

J: • Experiment 2 

In this portion of our study, we varied the relative amplitudes of the 
three tones composing the sinusoidal sentence, and again einployed .a test of 
differential similarity to determine the alternative most similar to the into- 
nation of the sinusoidal sentence. If, in Experiment 1, Tone 1 was Judged 
most similar in pitch pattern to the intonation . of the sinusoidal sentence 
merely because Tone 1 had the greatest energy of the^ three components of the 
sentence pattern^ then this should not recur when the ' relative ampli'tude 
difjerencea of the three tones are eliminated, or reversed. On the other 
hand, if Tone 1 is the stimulus for intonation because it occurs within the 
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dominance region, -then we should not expect amplitude variation to change the 
diCferential similarity performli^e, .as* long -as^Tone 1 is detectable (Jiitsma, 
19w). Experiment 2, therefore, estimated the effects on apparent intonation 
of equating the amplitudes, and of i-nverting the order of amplitudes, among 
the tones of. a three-'ComponoQt repli^Cci, of a natural, utterance. In addition, - 
we also used. a different sentCeno^e *in order to identify any effects that may 
have been particular to the plwrietlc f)roperties of the sentence used in the 
first experiment. . 



Method 



Subjects , , Fourteen listeners w^e again* drawn from the local population 
of audiologic^ally normal undergraduates. Nofne had been tested previously in 
studies of sinusdidal synthesi^Vs^ Subjects received pay fQr participatingr v 

Stimuli , A three-tone replica was prepared for the sentence "I read a 
book^today," according to the procedure described in Bxper'iment 1. ^Two ver- 

• sions were subsequently made from this replica. In the first, the tone §ijipli^ 
tudes were set equal; in the second, the amplitude order was the inverse of 
the natural case, with Tone 3 possessing the greatest power and Tone 1 the 
least*. Figure 3A sjiows the pattern of thpee" tones composing the sentences. 
Figures 3B and 3C show Fourier spectra of sections of the equal (flat) amplji- 

^tjKie' and uptilted amplitude versions of this sentence. 

The single-tone 'alternative patterns to be compared with the apparent 
intonation of the sinusoidal sentence on each trial consisted this time simply 
of e^ch of the three individual tones composing the sentence. The single--tone 
alternatives were prepared as in Experiment 1, with equal average power. On 
each trial, the subject heard one of the two versions of the sentence, with 
the flat or the uptilted spectrum, followed by two of the three alternative 
tone patterns. 

Procedure . Each trial began with a single presentation of the sinusoidal 
sentence ."I ^ead a book today," in either the flat or the uptilted spectrum 
version. Two single-tone alternatives followed, from which the subject 
selected the- better matah to the apparent intonation of the sentence. 
Collapsing over the^ counterbalancing- of order for each paic of alternatives, 
there were three different types of trials: the comparison of Tones 1 and 2, 
Tones 1 aitd 3,. and Tenes 2 and 3. Each of th^se was presented twenty times, 
ten times 'in each" order. In addition, twelve trials were Interspersed in the 
test order in which a normal spectrum relationship occurred among the tones of 
the sentence, although th,e overaU power was gre^atly^ reduced. The only alter- 
native tonal intonation* patterns presented for, this quiet, normal-amplitude 
rolloff sentence were Tones 1 andi2. The data Trom this condition- served as a 
converging ch^ck on the outcome of^. the first experiment. 

One hundred and thirty-two trials wer6 pr€sented\in this tiest. On each 
trial; the subject firVt identified the intionktion of the sinusoidal sentence 
presented and then selected* the more similar qT,. the two lagging aljternative 
tone patterns. The choice was recorded in pencil, on a specially prepared re- 
sponse booklet. There. w4re intervals of 1 s between items within a trial, 3 s 
between trials,, and 8 s following every twelfth trial. 
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Figure 3. (A) The three-tone pattern replicating the sentence "i read a book 
today." (B) Fourier analysis of a section -through the flat spec- 
trum version of^the ^s^tence (equal energy in each tone). (C) 
Fourier analysis of a section through the Uptilted -spectrum ver- 
sion. Compare to Figure 1 A. 
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ftesults anijl Discussion 

Th^ Judgments were handled in, 'a manner analogous to Experiment 1. Signed 
preference scores were determined for the three comparisons of the flat and 
uptlltejd^sentence conditions. For each comparison, the difference was comput- 
ed between the number of trials on which the first alternative was chosen 'and 
the number of trials on which the second was chosen. In the computation of 
the different scor^es, the alternatives were compared in this order: Tone 1 
vs. Tone 2, Tone 1 vs. Tone 3f Tone 2 vs. Tone 3- . A two-way repeated measures 
analysis of variance; with the factors SENTENCE (flat vs. uptilt) and COMPARI- 
SONS (Tones 1 vs. 2, 1 \s. 3, and 2 vs'. 3) was useij to determine whether there 
was an effect of tone amplitude in the aentence on the perception of intona- 
tion. The data^ from the quiet, normal-amplitude trJals; in which Tone 1. was 
the clear preference, we.re omitted from th.i3 analysis^ ' v 

The group data are shown in Figure ^. It. is obVSItous from that figure 
that Tone 1 retains its preferred status. This is confirmed by the analysis 
of variance. • There was a main effect of sentence type, indicating that the 
|H*eference scores were more consistent for the flat than for the uptilted sen- 
tences, F(1, 13) « 9.5, £ < .01; in addition, there was a main effect of tri- 
al type, F(2, 26) »^10.1, £ < .001, with Tone 1 preferred to each of the two 
pairs in whi(^h it 0(5curred, ' andll no consistent preference between Tones 2 and. 
3. The interaction term was nonsignificant, F(2, 26) * 0.6,,£ > .1, indicat- 
ing that the subjects pref erred^Tcfie 1 as the best mjatch for sentence intona- 
tion regardless of the spectrum manipulation. 

This .experiment supports the conclusion of our first experiment on sinus-' 
oidal intonation. It seems that the functions of Tojie 1 include both the seg- 
mental use typically associated with the first formant that it replicates, and 
the use typically identified with the fundamental frequency of phonati<5n in 
natural speech. The durability of the listener's reliance on Tone^ 1 for into- 
nation Information is noteworthy, especially considering the inversion of the 
order of relative amplitudes among the tonal components of the signal. It 
suggests that thye dual use of Tone 1 in sinusoidal sentences is brought about 
by virtue of l^s occurrence within the dominance region, and^not because It is 
thm component with • greatest power. '.Periodicity within %hls iS^equency band, 
including instances of relatively low power, ev^'dently determines the pitch 
pattern of the perceived sentence. It seems, then, that Tone 1 is concurrent- 
ly represertted as an amplitude peak in the spectrum, which provides informa- 
tion about- SGjfmental phonetic properties of trte utterance, and also as a peri- 
odiC;spectrura element that determines the apparent pitch of the tonal complex. 
Ordinarily, in speech^ the frequencies occurring within this region are 
harmonics of the fundamental frequency of phonation. However, in- this 
anomalous Oase of formant center frequencies without harmonic excitation, 
there is no stimulation, periodic or otherwise, in the range of a ^talker's 
fundamental, and therefore no harmonics in the dominance region. There is, 
dimply the-ilttm- v ar y i ng f requenuy isf tn^T6ne~ToI16wrrig the formant, which is 
treated as the stimulus for pitch by default, regardless of its amplitude rel- 
ative to the other components. , ^ • 

To conclude that the intonation of a sinysoidal replica is the correlate 
of Tone 1^ and that this is attributable primarily to the occurrence of this 
time-^varying periodic tone withln^the dominance region of tlie auditory system, 
we must establish that listeners rejpct Tone l.as'the best^, match of sinusoidal 
sentence intonation when the sentence does not include that tone. In other 
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words,' 'if a two- tone pattern, including onJ.y Tones 2 and 3, Is presented in 
» the same paradigm as- in Experiments, 1* and 2 ,'. listeners should not report that 
Tqne 1, nj^tches th'^ intonation of tbls pattern. Were they to persist /.in 
identi/ying Tone 1 . as tjie intonation pattern,, we would be forced to conclude 
that the ph)enomenon of -sinusoidal intonation - is Ifess.a matter of the ordinary 
« perception of extrao?^dinary. signals, as we have alleged, and actually is a 
matter of special induction of ad ho'c attributes of an unfamiUar stimulus. 
-.Experiment '3 was performed to £est whetT)er Tone 1 is identified as the 
correlate of intonation for patterns that do not contain it. 

^ ' ' - • 

Experiment 3f , ' • 

.At 'this point, the evi>dence shows that -the tone following the frequency 
var^ation of, the first formant is the correlate of . the intonation of sinusoid- 
al sentences. Ip all eases. Tone 1, corresponding to the track- of . the first 
formafft, was Judged more like the sentence* intonation than any other 
candidate. Our conclusion has- emphasized the listener's tendency to Identify 
the periodicity of the stimulus by attending to the dominance region, and to 
perceive pitch from the representation of stimulus^ frequency within that re- 
• * g-lon. Independent evidence from studies of nonspeech tones and vowels sup- 
ports th,e general conclusion that frequency in the dominance region causes ap- 
parent pitch, even for natural speech., Hence, the explanation of sinusoidal 
intonation that we offer is that these atypical stimuli are evaluated 
perceptually in essentially the same manner aS are nonspeech tonal complexes 
and speech sounds. ^ ' 

• % 

However, simply because subjects choose Tone 1 consistently- as the best 
match to apparent pi fch does not meafh that Tone 1 is causing the pitch, per- 
cept. To support 'this characterization of the perception of sinusoidal into- 
nation, we must determine that subjects do not select Tone 1 when it is absent- 
from, the tonal sentence. If subjects select Tone 1 as the match to intonation 
only when it is present in the sentence, then we would have reasonable grounds 
to support • our stimulus-baSed hypothesis QP»'the phenomenon. Otherwise, if 
subjects continue to prefer Tone 1 to other candidate tones when that tone' is 
^ omitted ,frpm the sinusoidal Sentence, ' then we would necessarily conclude that 
intonation occurred through a form of auditory induction, however similar this 
induced pattern would be to the pitch contour of Tone 1. Experiment 3 
•^evaluated this possibility by presenting a test of differential similarity in 
■which the sentences to be matclpd contained either the three tones correspond- 
^'ing to the first three forman% or merely the tones corresponding to the sec- 
ond and third formants, omitting the first. 

Method" 

Subjects . Twenty-.6ne listeners were selected as before from the student 
population of Barnard and- Columbia Colleges. None had participated prevtoualy 
in experiments of this nature. They, were paid for their participation. 

Stimuli . The .thre*e-tone sinusoidal replicas of the sentence "I read * a 
book today" .prepared in Experiment 2 provided the basis for all stimuli in 
this test. THr^e versions of the sentence were used. The first was the 
upt^lted amplitude replica, in which the tone amplitudes were , the inverse of 
the natural case of formants. Tone 1 had ^he least power, and Tone 3 the 
most. 'The second sentence was the pattern cdhisisting only of Tones 2 and 3 of 
this replica. This two-to^e^palitern was equated informally, by the authors, 

■ ~^'^^m- C: 
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for loudness equal to the three-'tone pattern. Note that Tone- 1 is omitted 
from this pattern. The third sentence was the three-tone replica, preserving 
the natural amplitude relations among the tones but presented at low power, 
again to serve as a check on the outcome. The three 3ingle-tone patterns from 
Experiment^ 2 were used^ as .Silternative pitch contoilji's in this, test of differen- 
tial similarity. / ' * 

Procedura l Listeners wene instruct;ed to identify the sentence melody of 
the sinu'soitial sentence presented fi^t on each trial, and then to select the 
better ^match *to- that sentence melody from the two lagging single-tone alterna- 
ti^es. Subjects were urged not. to omit judgments. The choices were scored in 
pencil in specially prepaf*ed response booklets. 

There were three different combination's of alternatives, counterbalanced 
for drder of 'presentation: * Tone \ vs. Tone 2, Tone 1 v;3. Tone 3, and Tone 2 
vq.. 3^ne 3. * Each trial type was presented twenty tifnes in random order with 
each of the two' sentence yersions, Three-Tones and Tones 2 and 3.^ A third 
sentence. Normal -Quiet , occurred twelve times in this test paired only with 
Tone 1 vs. Tone 2 alternatives. The test,' then, consisted, of 132 trial's. 
Within a trial, the three patterns were separated by intervals of 1 s. Trialai|. 
were separated by 3 s* with 8 s between blocks oJT 12 trials. 

Results and Discussion - ' V * 

-The results of the similarity Judgments are shown 'in Figure 5. It is 
Clear that subjects once again selected Tone 1 when it occurred as a component 
of t*ie sentence. In the casi^e of the sentence containing only Tones 2 and 3, 
however,* subjects instead preferred Tone 2 to Tone 1 as the best match for the 
sentence intonation. This outcome corresponds to a highly significant 
interaction term In the analysis of variance, F(2, 1I0) « 52.'^, £ < .001. Sub- 
jects also preferred Tone 2 when it was pitted against Tone 3 in the context 
of -the two-tone Sentences. Overall, subjects reported that sentence pitch was 
matched best by Tone 1 only^ when that tone was a component of the sentence. 

This third' experiment Is encouraging with respect to the hypothesis we 
offered about sinusoidal intonation. Subjects appear to be treating these 
anomalous signals In a manner similar^ to speech signals. It is as if the seg- 
mental information is obtained from the formant^like frequency variation of 
the tones, and intonatlonal information is provided by the periodicity within 
the dominance region. This ocpurs despite the congruence of these two kinds 
of i-riformatlon in the pattern ot frequency variation of Tone 1. ' 

However, to establish tUle appropriateness of this application of the 
dominance region notion, we must perform one final test. This is necessitated 
by the kind of evidence we have obtained so far on the predominance of Tone 1 
'In producing the apparent Intonation. Although our e)fperiments have shown 
th^t listeners consistently Judge this tonal -component to be most like the 
sentence melody of a sinusoidal ujbterance, we have not separated two aspects - 
of this tone within the three-tonfe pattern that composes .a sentence. In the 
three tests that we report, the /tone corresponding to the first formant^has 
been both the tone within the dominance region and -^the tone with the lt)west 
frequency, overall, In the three-tone complex. Because of this fact, we can- 
not 'distinguish empirically between the dominance region hypothesis and a low- 
est^-frequency component hypothesis. To do so requires a test In which the 
subjects evaluate a sentence that cpntains tonal components falling In the 
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Figure 5* Results of Experiment 3* (A) Differential similarity data for the 
three-tone sentence with flat rolloff. (B) Differential similarity 
data from the two'-tone and quiet-normal conditions. 
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dominance region, below the dominance hegion (with frCquonciea < ^400 Hz) and 
above the dominance region (with frequencies >. 1000 Hz). ' We can predict the 
outcome based on Experiments I-3: When subjects listen to such a sentence, 
they should either attend 'to, the tone "within the critical frequency range for 
perceiving Intonation,, whieh would encourage the dominance region explanation 
that we have proposed; or,, they should prefer the lowest freq.uency tone, 
* which would falsify the dominance region hypothesis, though in a manner. con- 
.sistent with the findings that we have noted throughout this investigation. 
This test is the topic of Experiment ^4 . 

Experiment k 

m 

The original rationale for the dominance region was tha^t the auditory 
system gets the stimulus for- pitch wtiere the harmonics are resolved |he best. 
At this juncture, we have shown the superiority of Tone 1 (corresponding to 
the first formant) compared to simultaneously occurring tones with higher fre- 
quencies. ' Additionally, the dominance region hypothesis predicts that 
listeners should also reject tones falling below the dominance region. To 
perform this test of the claim, we returned to the natural utterance of our 
familiar test sentence, and analyzed its fundamental frequency, pattern. From 
this analysis, a new set of sinewave synthesis parameters was created to form 
a tone with a pattern of frequency variation matching the natural fundamental 
frequency contour. These values were used in combination with the three-tone 
replica to generate a four-tone sentence, comprisihg a ''•f undamental frequency" 
tone and the three "formant" tones, as well as the additional single tone al- 
ternative to use in the similarity test format. 

In the four-tone sentence that. Subjects evaluated, tjie tone matching the 
fundamental frequency contour falls below the dominance region. If the like- 
ness of > the first formant tone to the apparent sinusoidal intonation is based 
on its occurrence within the critical frequency* range, ,then we may expect 
listeners to reject the fundamental frequency tone no less consistently tfian 
they 'have rejected the second and third formant topes In Experiments 1, 2, and 
3.' In other. words, a tone representing a fundamental frequency pattern from a 
natural utt-erance should Ironlqally not provide Informa^tlon for sentence melo- 
dy In this case, despite the naturalness of Its pattern of variation and the 
appropriateness of Ita occurrence In the normal frequency range of tj^ funda^ 
mental frequency. * . ^ 

Method 

Subjects. Twenty-four listeners participated in this study. They eaqh 
reported a normal history of speech and. hearing function, and had not previ- 
ously been Introduced to synthetic speech or sinewave materials. Our subjects 
were student volunteers who received course credit in exchange ft)r taking thlg 
brief test. ' ' 

» 

Stimuli . -The sentence presented to subjects In this test was composed of 
four tones: Tone 0 corresponding to the fundamental frequency (commonly 
termed FO) and overall amplitude of the original natural utterance of "I read 
a book today," on which we jpatterned the sinewave sentences reported In the* 
previous two expertments; and Tone 1, Tone 2, and Tone 3, each corresponding 
to the pattern of center-frequency and amplitude variation of the first three 
formants. The values of the , fundamental of' the natural utterance were ob- ^ 
tainted by employing the cepstral method of pitch extraction on the -.sampled da- 
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ta, and were converted to aine^ave synLheaia parameters by inciuding amplitude 
vali:^G3 varying in imitation of the overall sjenergy of the natural utterance. 
The pattern of frequency variation of Tone^O is shown in Figure 6A. The 
four-tone pattern formed ha combining Tone 0 with the three tones that' 
replicate formant variation preserved tbe natural .spectral, amplitude rolloff, 
as shown in Figure 6B, 

The test stimuli also int?luded the four sii\usoidal components realized as 
single-tone patterns, to be used as alternative pitch stimuli in the similari- 
ty test. Eafch of the tones was resynthesized in isolation and the four were 
equated for loudness. 

• . ■ " 

As before, the test sequence was recoy^ded on audiotape, and presented to 
listeners via playback anc^ calibrated headsets. An average listening level of 
72 dB SPL was used. ' . 

'* * ' ^ 

' Procedure . 'A test of apparent similarity was again used in. this experi- 
ment. Each trial consisted . of three sinusoidal patterns: firs^t, the 
'four-tone sentence pattern, followed by two single-tone patterns.' There were 
six 'different trial types^ exhausting the possible comparisons among the four 
single-tone candidates: Tone 0 vs. Tone 1, Tone 0 vs. Tone 2., Tone 0 vs. Torve 
3, Tone 1 vs. Tone 2, Tone 1 vs. Tone 3, and Tone 2 vs. Tone 3* Each was 
presented in two orders to counterbalance the occurrence of alternatives. 
Altogfether, the test consisted of the six trial types presented 1 i| times each, 
including counterbalancing, composing a sequence of 8^ trials. 

» 

. On each trial, subjects were instructed to identify the sentence melody 
of the first sinusoidal pattern, and then to select the better matgh of the 
two lagging alternative patterns. Omissions were discouraged. The judgments 
were reported with pencil and paper using specially prepared booklets. 

Results and Discussion 

The histograms in Figure 7 describe the results of the similarity test. 
Tone 1, corresponding to the first formant, was <5nce again preferred, to every 
other candidate tone. Tone 2 was judged more similar to the intonation pat- 
tern than was Tone 3, an unanticipated effect. And, most (Critically, subjects 
rejected Tone 0 consistently when it was an alter/native paired with Tone 1, 
indicating that the impression of sentence melody is stable. These results 
were confirmed in the analysis of variance of similarity scores, F(5, 155) 
i|0.1, £ < .001, and by Scheff^ pt^st hoc means tests. 

The pojttern af results of Experiment ^ clearly confirms the" appropriate- 
ness of the dominance region hypothesis for the phenomenon of sinusoidal sen- 
tence intonatjori. In fact, the congruence of segmental and intonational 
information in the sinusoidal case of Tone 1 permits us to support a proposal 
about auditory analysis of natural speech: Fundamental periodicity is 
represented in the auditory system based on harmonics detected ithin the 
dominanc'te region cind not on attention to the fundamental itself. Because Tone 
1 ocGura within the range of this normal region for detecting periodicity in 
the waveform, it seems to be treated as the principal stimulus fon pitch 
perception. 
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~, General Discussion 

* I " 

Prosody is a 1 perceptual dimension of utterances that is not' caused by 
variation in any/ single physical dimension of the acoustic signal. The 
listener is likely to treat the duration, amplitude, and fundamental period of 
portions of the' speech signal as* changes in the rhythm, meter, and organiza- 
tion of the linguistic utterances that perception defines. One aspect of 
pro3ady is intonation^ or sentence melody. The proble^m for the theorist is to 
identify the relations among the quite di;3similar physical ingredients that 
produce impressions of intonation in some cases, but create impressions of 
duration, or loudness, or lexical stress, or perhaps syntactic constituent 
boundaries, . in others. In addition to the effects of these physical vari-- 
ables, perception of intonation has been viewed as a proems that refers to 
linguistic knowledge; because judgments of intonation oft^ reflect lexical 
properties (Lleberman, 1965; but see Lea, 1979). 

Gi'Ven the intricate interplay of physical and pSrceptual components in 
prosodic perception, it seem anticlimactic to assert that the perceptton of 
intonation is based principally on fundamental frequency, in some instances 
necessarily so (Abramson, 1972). However, intonation is potentially deter- 
mined from' integrated energy or from frequency variation in the third and 
fifth formants 'in whispered sentences (Meyer-Eppler, 1957), which lack 
contours of fundamental frequency. As such, the whispered utterance is the 
most reasonable precedent for sinusoidal sentence perception. A sinusoidal 
replica also lacks a , fundamental frequency of excitation common to its tonal 
components, and therefore we might have expected it to be treated in. a manner 
similar* to that of a whispered sentence. Instead, we found consistent 
perceptual reliance on tl?e portion of the signal within the dominance region 
as the primary ingredient* to intonation, much as occurs for normal utirerances. 

We^eannot yet define a principle by which intonation is variously derived 
from the fundamental, or the amplitude enveiope^ or the higher formant fre- 
quency chaoges. Because our exploratory studies probed this phenomenon at the 
sentence level, 'neither have we determined the extents of the likely ' influence 
of duration, amplitude, and relative frequency change, on the one hand, or of 
lexical access, constituent structure, and the encoding of intonation in memo- 
ry, on the other. Each of these factor-s may be suspected of moderating the 
effect on fundamental frequency. Even if these other influences are slight, 
we may nevertheless expect intonation to differ from the fundamental frequency 
pattern (Hadding-Koch & Studdert-Kennedy, 1964). With these cautions in mind, 
we propose that our investigation describes the [perceptual registration of the 
strongest influence on intonation, the fundamental frequency. 

^ . 

The studies reported here lead us to conclude that speech signals are an- 
alyzed for fundamental frequency in the dominance* region, coincidentally , the 
region of the first formant, as Greenberg (1980) hypothesiied on the basis of 
studies of the strength of periodicity in auditory evoked potentials with 
synthetic vowels. It is somewhat ironic that sinusoidal signals, clearly 
unnatural in vocal timbre, provide evidence on this question. But, if the ayu- 
ditory system ordinarily detects periodicity from the harmonics in the domi- 
nance region, then when it faj^ls 'to find harmonics it seems nevertheless to 
represent the* pitch of a complex signal by its period in this region. A si- 
nusoidal sentence is a kind of exceptional st^imulus that tests the-rule, and 
confirms it. * * 

r 
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Is the intonatibn of sinusoidal • sentences *the result of periodic acous*- 
tic structure^ub3eq\iently transformed by duration and loudness (or by segmen- 
tal and morphological structure)? If sinusoidal signals and itatyral speech 
are analyzed in a common manner, as we claim, then'we may certainly expect si- 
r]iusoidal intonation to be affected by acoustic 'and linguistic properties be- 
sides frequen6y of the tone in the critical range. For the present though, 
the evidence suggests that the primary correlate .of sinusoidal intonation is 
the' tone that reprjKluces the frequency variation, of the first formant. And, 
while this outcome is revealing about the percefc^tion of natural speech,* it al- 
so supports the contention that sinusoidal* replicas of utterances are per- 
ceived like ordinary phonetic signals. 
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Footnotes 

*A pure tone is not a formant. A sinusoid is defined by the function, j^r «" 
asin Xf and may occur at any frequency within the audible range. A formant is 
a 'natural resonance of the vocal tract, and its frequency is defined, as the 
peak of the spectrum envelojJe drawn to enclose the harmonics produced by the 
excitation of the vocal tract (Fant^ 1956). . Although we have constructed 
sinusoids that imitate the pattern of formant center-frequency variation, they 
do not also imitate the acoastic strticture of formants, by this definition. 
For a basic discussion of the physical acoustics of speech, see Joos (1948). 

^The intonation of a sentence Is its pitch contour (Catford, 1977), 
though this definition is percep tuaXAyj troublec3ome. This is so because the 
t^rjn pitot^ is traditionally used to refer to that perceptual impression 
correlated with fundamental frequency. Intonation is also correlated mainly 
with fundamental frequency, although "pitch applies to speech and nonspeech, 
and intonation more narrowly applies to speech exclusively. In view of this, 
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is serftence intonation the product or the equivalent of sentence pitch con- 
tour? yhe fact that/ aspects of signal duration and power intrude on the 
perception of b.oth intonation and pitch argues 'that both terms name the same 
attribute. The^ influence of lexical structure in Judging sentence melody 
argues against any simple equivalence, although it by no means warrants that a 
separate auditory impression of pitch contributes to the impression of intona- 
tion. (Linguists have occasionally combined the analysis of intonation and 
word stress [reviewed by Lieberman, 1967], although to do so does not dismiss 
the phenomenon of sentence pitch — it simply adds another problem to consider.) 
Our present use of the tej^ra, then, refers to the fact that sentence "pitch 
contour,", sentence "melody," and sentence "intonation" seem to indicate the 
same aspect of spoken sentences, although its perceptual derivation is diffi- 
cult to resolve. 
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